-Notes and Blogs-
Connecting Attention in Transformers to Linear Regression
The attention mechanism actually performs similar computations to linear regression - probably a loose connection but worth noting down.
From the EM Algorithm to Predictive Coding
Notes about my understanding of predictive coding as a special case of the Expectation-Maximization algorithm