All Posts
- 25-08-09flash-attention
- 25-08-01early august paper reading
- 25-07-23late july paper reading
- 25-07-15mid july paper reading
- 25-07-11on freedom
- 25-07-03early july paper reading
- 25-07-02deepseek's high level software magic
- 25-06-19wise words
- 25-06-15mid june paper reading
- 25-06-13transformer from scratch
- 25-06-11mechanistic interpretability, part 1
- 25-06-10decoder-only architecture, kv cache, and mla
- 25-06-09notes on latent attention
- 25-06-08understanding attention
- 25-06-05favorite books
- 25-06-03early june paper reading
- 25-02-10learning
- 25-02-09the stack