| 287 | FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-Precision | 2024-07-11 | 
                    
                      
                        | 165 | Based: Simple linear attention language models | 2024-03-05 | 
                    
                      
                        | 143 | Dragonfly: A large vision-language model with multi-resolution zoom | 2024-06-06 | 
                    
                      
                        | 80 | A practitioner's guide to testing and running GPU clusters | 2024-08-13 | 
                    
                      
                        | 4 | The Mamba in the Llama: Distilling and Accelerating Hybrid Models | 2024-09-09 | 
                    
                      
                        | 3 | Fine-tuning Llama-3 to get 90% of GPT-4's performance at a fraction of the cost | 2024-07-19 | 
                    
                      
                        | 3 | Together Inference Engine 2.0 with new Turbo and Lite endpoints | 2024-07-18 | 
                    
                      
                        | 2 | Speculative decoding for high-throughput long-context inference | 2024-09-05 | 
                    
                      
                        | 2 | Together MoA–collective intelligence of open-source models pushing LLM frontier | 2024-06-15 | 
                    
                      
                        | 2 | Evo: Long-context modeling from molecular to genome scale | 2024-02-27 | 
                    
                      
                        | 1 | Flux API available on Together AI:FLUX1.1 [pro] and free access FLUX.1 [schnell] | 2024-10-03 | 
                    
                      
                        | 1 | Together AI embeddings endpoint with higher quality, 4x lower cost than OpenAI | 2024-01-11 | 
                    
                      
                        | 1 | Linearizing LLMs with LoLCATs | 2024-10-15 | 
                    
                      
                        | 1 | Free Llama 3.2 vision API | 2024-09-25 | 
                    
                      
                        | 1 | New SOTA Reranker from Salesforce | 2024-09-10 | 
                    
                      
                        | 1 | RedPajama-Data-v2: An open dataset with 30T tokens (2023) | 2024-04-22 | 
                    
                      
                        | 4 | LlamaTutor | 2024-07-24 | 
                    
                      
                        | 2 | Generate react apps with Llama 3.1 | 2024-08-02 | 
                    
                      
                        | 3 | Fine-Tuning LLMs for Multi-Turn Conversations: A Technical Deep Dive | 2024-11-27 | 
                    
                      
                        | 3 | Together AI acquires CodeSandbox to launch code interpreter for generative AI | 2024-12-12 | 
                    
                      
                        | 31 | DeepCoder: An Open-Source 14B Coder at O3-Mini Level | 2025-04-09 | 
                    
                      
                        | 1 | Together Code Sandbox | 2025-05-20 | 
                    
                      
                        | 37 | Direct Preference Optimization vs. RLHF | 2025-05-25 | 
                    
                      
                        | 1 | The Frontier Is Open | 2025-06-09 | 
                    
                      
                        | 198 | AdapTive-LeArning Speculator System (ATLAS): Faster LLM inference | 2025-10-12 |