LLM Scaling Laws

--

The performance of an LLM is a function of

  • N — the number of parameters in the network (weights and biases)
  • D — the amount of text we train on
Model performance (Loss) vs. Model parameters (size) As we increase the number of parameters, the model performance increases (Source: Compute-Optimal Large Language Models)

This is why we see increasingly bigger LLMs are released with time. We get more “intelligence” for free with scaling!

Size of the LLMs over time (Source: By Author)
This diagram provides another perspective on the model size growth over time.

This trend provides an unfair advantage to large corporations like OpenAI, Google, Microsoft, Meta, etc. as they can spend billions of dollars to get more computing infrastructures to build very large LLMs.

Can the smaller models like Mixtral (which uses a mixture of experts) break this trend?

I will blog about this in a follow-up post. Stay tuned!

--

--

AI/Data Science Digest
AI/Data Science Digest

Written by AI/Data Science Digest

One Digest At a Time. I value your time! #datascience #AI #GenAI #LLMs #dataanalyst #datascientist #probability #statistics #ML #savetime #digest

No responses yet