Why we divide by the square root of the key dimension in attention
There are three implicit assumptions we make
Oct 12 • 
Paulius Rauba
Attention mechanism for non-Euclidean spaces
Looking at the geometric assumptions behind attention.
Oct 5 • 
Paulius Rauba
© 2025 Paulius Rauba
Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture