How LLMs Work
A simplified view of how large language models work
With co-pilots everywhere, have you ever wondered how LLMs that power these co-pilots work?
In this post, I am going to give you a 10000-foot view of how LLMs work.
An LLM is nothing but a large number of parameters, called weights and biases, packed into a file plus a small piece of code to use these parameters to predict the next token given some text.
Parameters are the knowledge that a model learns through training and represent the skills of the model in a compressed format. They represent weights and biases. Think about a lossy compression like JPEG or MP3 that is further compressed using zip. A token is a word or a part thereof. These parameters are learned by training on a vast amount of text data. For example, as shown in Figure 4, GPT-4 has 1.76 trillion parameters, Llama-2 has 70 billion parameters and Gemini has 1.56 trillion parameters.