-Notes and Blogs-


Connecting Attention in Transformers to Linear Regression

The attention mechanism actually performs similar computations to linear regression - probably a loose connection but worth noting down.


From the EM Algorithm to Predictive Coding

Notes about my understanding of predictive coding as a special case of the Expectation-Maximization algorithm