| 287 |
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-Precision |
2024-07-11 |
| 165 |
Based: Simple linear attention language models |
2024-03-05 |
| 143 |
Dragonfly: A large vision-language model with multi-resolution zoom |
2024-06-06 |
| 80 |
A practitioner's guide to testing and running GPU clusters |
2024-08-13 |
| 31 |
DeepCoder: An Open-Source 14B Coder at O3-Mini Level |
2025-04-09 |
| 37 |
Direct Preference Optimization vs. RLHF |
2025-05-25 |
| 198 |
AdapTive-LeArning Speculator System (ATLAS): Faster LLM inference |
2025-10-12 |