**Transformers:** - https://jalammar.github.io/illustrated-transformer/ - https://nlp.seas.harvard.edu/2018/04/03/attention.html - - What Are Transformer Models and How Do They Work? https://txt.cohere.ai/what-are-transformer-models/ by Luis Serrano 12 / 04 / 2023 When I first tried to understand transformers, I superficially understood most material, but I always felt that I did not really get it on a "I am able to build it and I understand why I am doing it" level. I struggled to get my fingers on what exactly I did not understand. I read the original paper, blog posts, and watched more videos than I care to admit. [metanonsense](https://news.ycombinator.com/user?id=metanonsense) [4 hours ago](https://news.ycombinator.com/item?id=35578988)The one source of information that made it click to me were chapters 159 to 163 of Sebastian Raschka's phenomenal "Intro to deep learning and generative models" course on youtube. https://www.youtube.com/playlist?list=PLTKMiZHVd_2KJtIXOW0zF... MIT 6.S191: Recurrent Neural Networks, Transformers, and Attention https://www.youtube.com/watch?v=ySEx_Bqxvvo #TBD #TODO - BERT - GPT