| 131 |
Lambda on hard mode: serverless HTTP in Rust |
2024-03-16 |
| 11 |
Modal is now generally available |
2023-10-10 |
| 9 |
Catching crypto miners using syscall signatures |
2024-06-07 |
| 8 |
DoppelBot: Replace Your CEO with an LLM |
2023-05-15 |
| 7 |
Embedding (RAG) all of Wikipedia in less than 15 minutes |
2024-01-24 |
| 6 |
The future of AI needs more flexible GPU capacity |
2024-10-25 |
| 6 |
How to beat proprietary embedding models with open-source |
2024-04-29 |
| 4 |
Beat GPT-4o at Python by searching with 100 dumb LLaMAs |
2024-08-06 |
| 4 |
Modal is GA and raised a 16M Series A |
2023-10-10 |
| 3 |
A beginner's guide to LLM fine-tuning |
2023-11-08 |
| 3 |
Modal – Run code in the cloud without managing your own infrastructure |
2023-01-04 |
| 2 |
Modal now charging for reserved containers(minimum of 0.125 cores per container) |
2024-07-23 |
| 2 |
Using CUDA on Modal |
2024-06-24 |
| 2 |
Run GPU Jobs from Airflow |
2024-06-21 |
| 2 |
How Ramp automated receipt processing with fine-tuned LLMs |
2024-04-02 |
| 1 |
Finetune Any Llama in Minutes on Modal |
2023-12-01 |
| 1 |
Modal – an end-to-end stack for cloud compute |
2022-12-23 |
| 125 |
Static IPs for Serverless Containers |
2024-12-02 |
| 1 |
Tidbyt Is Joining Modal |
2024-12-02 |
| 230 |
The Missing Nvidia GPU Glossary |
2025-01-12 |
| 13 |
GPU Programming Glossary |
2024-12-12 |
| 2 |
Modal Launches Sandboxes |
2025-01-21 |
| 232 |
DoppelBot: Replace Your CEO with an LLM |
2025-02-04 |
| 9 |
Checkpoint/restore for sub-second container startup |
2025-01-29 |
| 154 |
'I paid for the whole GPU, I am going to use the whole GPU' |
2025-05-07 |
| 1 |
Using the Lamborghini of inference engines for serverless Llama 3 |
2025-04-21 |
| 2 |
Modal SDKs for JavaScript and Go |
2025-04-30 |
| 62 |
Linear Programming for Fun and Profit |
2025-05-09 |
| 2 |
Modal's Serverless KV Store Now Scales to Infinity |
2025-05-20 |
| 4 |
The LLM Engine Advisor |
2025-06-03 |
| 1 |
Introducing: B200s and H200s on Modal |
2025-06-04 |
| 5 |
Generating diffusion QR codes that work |
2025-07-02 |
| 1 |
The LLM Engine Almanac |
2025-06-09 |
| 4 |
Dollars per Token Considered Harmful |
2025-07-16 |
| 4 |
Transcribe speech 100x faster and 100x cheaper with open models |
2025-07-28 |
| 9 |
GPU Memory Snapshots: fast container cold boots |
2025-07-31 |
| 2 |
The GPU Glossary: Performance |
2025-09-04 |
| 5 |
We reverse-engineered Flash Attention 4 |
2025-09-26 |
| 4 |
Modal Notebooks, a real-time collaborative notebook with cloud GPUs |
2025-09-09 |
| 4 |
Modal Notebooks: How we built a cloud GPU notebook that boots in seconds |
2025-09-17 |
| 3 |
Inside vLLM: Anatomy of a High-Throughput LLM Inference System |
2025-09-13 |
| 3 |
Modal's $87M Series B |
2025-09-29 |
| 3 |
One second voice-to-voice latency with just open models |
2025-11-09 |
| 3 |
Agents need good developer experience too |
2025-11-20 |
| 3 |
Host overhead is killing your inference efficiency |
2025-11-19 |