Subscribe
Sign in
Why we divide by the square root of the key dimension in attention
There are three implicit assumptions we make
Oct 12
•
Paulius Rauba
Attention mechanism for non-Euclidean spaces
Looking at the geometric assumptions behind attention.
Oct 5
•
Paulius Rauba
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts