Why we divide by the square root of the key dimension in attention
There are three implicit assumptions we make
Oct 12, 2025 • Paulius Rauba
Attention mechanism for non-Euclidean spaces
Looking at the geometric assumptions behind attention.
Oct 5, 2025 • Paulius Rauba
© 2026 Paulius Rauba · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture