Transformers

Introduction The goal of this series of posts, is to form foundational knowledge that helps us understanding modern state-of-the-art LLM models, and gain a comprehensive understanding of GPT via reading the seminal papers themselves. In my previous post, I covered some of the seminal papers that formulated sequence based models from RNNs to the Attention mechanism in encoder-decoder architectures. If you don’t know about them, or would like a quick refresher - I recommend reading through the previous post before continuing here....

Transformers

Understanding GPT 1, 2 and 3
Machine Learning

Understanding GPT - Transformers
Machine Learning