Embedding, LangChain, and the customised AI future

Newsletter for the week ending 8th April 2023

8th April 2023

Learn: Large Language Models (LLMs) like ChatGPT only have information about the world up to the point they are released. So how do we get it to respond to or help with our or our company's data instead? This process of making sure an LLM understands your data too is called embedding.
Create: As LLMs like Open AI’s GPT-3, GPT-4 or Stanford’s Alpaca are released, software tools are being developed to help developers build on top of these AI models. LangChain is a framework for developing applications powered by language models. This article will detail a quick dive into the powerful capabilities of this tool.
News: On Tuesday at a meeting of President Biden’s Council of Advisors on Science and Technology (PCAST), Biden said, “Tech companies have a responsibility, in my view, to make sure their products are safe before making them public,”. As AI alignment becomes politically more important, interconnected tools and frameworks may provide a solution.

What is Embedding?

As Large Language Models like ChatGPT become more commonplace, companies and individuals will want to leverage their capabilities to improve the user experience of their own services and data. Embedding is the process of providing pre-trained data (we'll get to what that means in a minute) to augment an existing LLM instead of retraining an entire model.

One obvious example use case is adding ChatGPT like functionality to a company knowledge base or support board. In this example a user could ask specific questions and receive detailed answers, instead of scanning FAQs or searching through hundreds of documents.

To get an LLM to understand the data, you first need to pre-train your data. Pre-training is the 'P' in ChatGPT (Generative Pre-Trained Transformer) and is a complex multi-step process, that takes a large data set and attempts to learn about general features or patterns in the data. The result is normally a data file in the form of a series of vectors (data that LLMs understand) that can then be sent to the model.

Thoughts

The ability to use pre-trained LLMs and augment them with custom data is a powerful tool for companies and individuals looking to provide advanced natural language processing (NLP) services. It allows them to take advantage of the latest advances in NLP without having to invest in expensive and time-consuming training of their own models.

LangChain

So now we know what embedding is, how do we go about augmenting an LLM like ChatGPT with our data? Enter LangChain. LangChain makes the complicated parts of working and building with AI models easier. Firstly, it helps by enabling you to bring external data, like files, other applications, and API data, to your LLMs. Secondly, it allows your LLMs to take answers from user questions, which it can then use to take new actions or make decisions. This is very powerful and gives the AI 'agency' or the ability to act independently.

The video below is an excellent overview of LangChain and goes over some of the key use cases of this technology in a very simple and easy to understand format.

If you are more technically minded, the LangChain Cookbook is available on GitHub with lots of examples to get you started.

Thoughts

While there are many, many tools out there to help get you started with AI development, this feels like an essential framework that easily provides the capability to build new apps or services on top of these existing LLMs.

The most important aspect of LangChain is the 'chain' part. An example might be to ask a user to pick a country, then use the response to work out what a popular dish is from that area. The LangChain can then pass that info to a new model to produce a recipe for the user along with a list of ingredients to purchase.

It's clear that LLMs combined with tools like LangChain create the basis for a truly new development platform. We can now deliver products that were previously not possible.

Politics and Eliezer Yudkowsky follow up

AI issues have now made their way to the table of the US President. At a scheduled meeting of his Science and Technology advisers, Biden made his remarks as reported by Reuters. While the letter from The Future of Life Institute has certainly got politicians attention, it doesn’t appear that it has convinced anyone in government to advance any legislation at this point.

In the same vein, I have now listened to the entire podcast with Eliezer Yudkowsky describing the dangers of AI, which I highlighted last week. His arguments are broadly:

AGI (a superhuman Artificial General Intelligence) will easily break out of any system we design to keep it in.
An escaped AGI will then target us because we are very slow thinking and represent a threat to the continued existence of the AGI.
If we try to create an AGI with a set of values that are aligned with us we will be unable to determine if it is actually aligned or lying to hide its true intentions, since we currently have no means to understand what is going on inside a neural network.

Thoughts

The interview is very interesting and well-argued but I remain unconvinced by some of his logic, especially that this will inevitably result in the end of the human race. The key difference for me is that while we do indeed have no understanding of what goes on inside neural networks, I do not believe it is essential to the creation of a correctly aligned AGI. As the quote by Steven Levitt goes "Don't listen to what they say, watch what they do". Technologies like LangChain and the new frameworks we create force AIs to interact through these mediums. They can only act on the world through these points. If we focus our efforts on AI alignment here we can more easily place limits on an AIs capability and monitor their actions over time. We get to watch what they do, not listen to what they say.

What is Embedding?

Thoughts

LangChain

Thoughts

Politics and Eliezer Yudkowsky follow up

Thoughts

Large Language Models will become the heart of all software

Wolverine brings hands free coding and the AI backlash begins.

Sam Altman on the interview circuit and GPT-4 announces Plugins

Subscribe to our Newsletter and stay up to date!