| 287 | FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-Precision | 2024-07-11 | 
                    
                      
                        | 221 | Paving the way to efficient architectures: StripedHyena-7B | 2023-12-08 | 
                    
                      
                        | 165 | Based: Simple linear attention language models | 2024-03-05 | 
                    
                      
                        | 143 | Dragonfly: A large vision-language model with multi-resolution zoom | 2024-06-06 | 
                    
                      
                        | 80 | A practitioner's guide to testing and running GPU clusters | 2024-08-13 | 
                    
                      
                        | 70 | Together AI raises a $102.5M Series A | 2023-11-29 | 
                    
                      
                        | 4 | The Mamba in the Llama: Distilling and Accelerating Hybrid Models | 2024-09-09 | 
                    
                      
                        | 3 | Fine-tuning Llama-3 to get 90% of GPT-4's performance at a fraction of the cost | 2024-07-19 | 
                    
                      
                        | 3 | Together Inference Engine 2.0 with new Turbo and Lite endpoints | 2024-07-18 | 
                    
                      
                        | 2 | Speculative decoding for high-throughput long-context inference | 2024-09-05 | 
                    
                      
                        | 2 | Together MoA–collective intelligence of open-source models pushing LLM frontier | 2024-06-15 | 
                    
                      
                        | 2 | Evo: Long-context modeling from molecular to genome scale | 2024-02-27 | 
                    
                      
                        | 2 | Together Inference Engine – the fastest inference available | 2023-12-12 | 
                    
                      
                        | 1 | Flux API available on Together AI:FLUX1.1 [pro] and free access FLUX.1 [schnell] | 2024-10-03 | 
                    
                      
                        | 1 | Together AI embeddings endpoint with higher quality, 4x lower cost than OpenAI | 2024-01-11 | 
                    
                      
                        | 1 | Linearizing LLMs with LoLCATs | 2024-10-15 | 
                    
                      
                        | 1 | Free Llama 3.2 vision API | 2024-09-25 | 
                    
                      
                        | 1 | New SOTA Reranker from Salesforce | 2024-09-10 | 
                    
                      
                        | 1 | RedPajama-Data-v2: An open dataset with 30T tokens (2023) | 2024-04-22 | 
                    
                      
                        | 236 | RedPajama v2 Open Dataset with 30T Tokens for Training LLMs | 2023-10-30 | 
                    
                      
                        | 84 | Llama 32K Context Released by Together AI | 2023-07-29 | 
                    
                      
                        | 54 | Llama 2 on togetherAI is as bad of a privacy nightmare as OpenAI | 2023-09-08 | 
                    
                      
                        | 4 | LlamaTutor | 2024-07-24 | 
                    
                      
                        | 2 | Generate react apps with Llama 3.1 | 2024-08-02 | 
                    
                      
                        | 2 | Llama-2-7B-32K-Instruct – and fine-tuning for Llama-2 models with Together API | 2023-08-22 | 
                    
                      
                        | 2 | FlashAttention-2: Faster attention with better parallelism and work partitioning | 2023-07-17 | 
                    
                      
                        | 2 | Together API hosts open source models | 2023-07-14 | 
                    
                      
                        | 3 | Fine-Tuning LLMs for Multi-Turn Conversations: A Technical Deep Dive | 2024-11-27 | 
                    
                      
                        | 3 | Together AI acquires CodeSandbox to launch code interpreter for generative AI | 2024-12-12 | 
                    
                      
                        | 31 | DeepCoder: An Open-Source 14B Coder at O3-Mini Level | 2025-04-09 | 
                    
                      
                        | 1 | Together Code Sandbox | 2025-05-20 | 
                    
                      
                        | 37 | Direct Preference Optimization vs. RLHF | 2025-05-25 | 
                    
                      
                        | 1 | The Frontier Is Open | 2025-06-09 | 
                    
                      
                        | 198 | AdapTive-LeArning Speculator System (ATLAS): Faster LLM inference | 2025-10-12 |