2 |
HuggingFace on Sheets |
2025-03-24 |
11 |
smolagents: A simple library to build AI agents |
2025-01-02 |
10 |
Phi-4 weights have been released under MIT license |
2025-01-08 |
3 |
Timeline of AI model releases in 2024 |
2025-01-01 |
2 |
Vdr-2B-multi-v1 a multilingual embedding model for visual document retrieval |
2025-01-10 |
2 |
Show HN: We collected detailed annotations for text-to-image generation |
2025-01-10 |
2 |
Hugging Face Smolagents |
2025-01-05 |
2 |
Hugging Face advocates for Code Agents: agents that write tool calls as code |
2025-01-02 |
2 |
ModernBERT: Encoder-only Transformer Model Strictly Improving on past work |
2025-01-01 |
52 |
Train faster static embedding models with sentence transformers |
2025-01-15 |
6 |
Kokoro-TTS |
2025-01-13 |
2 |
Flex.1-Alpha – A new modded Flux model that can properly handle being fine tuned |
2025-01-19 |
1 |
Show HN: An Agentic AI dataset for deepfake detection |
2025-01-15 |
394 |
Open-R1: an open reproduction of DeepSeek-R1 |
2025-01-28 |
227 |
Kokoro WebGPU: Real-time text-to-speech 100% locally in the browser |
2025-02-07 |
49 |
Janus-Pro: Autoregressive framework unifying multimodal understanding&generation |
2025-01-27 |
39 |
DeepSeek-R1-Distill-Qwen-1.5B Surpasses GPT-4o in certain benchmarks |
2025-01-20 |
38 |
Fully autonomous AI agents should not be developed |
2025-02-07 |
20 |
Selene Mini: Open-sourced SOTA small language-model-as-a-judge |
2025-01-29 |
19 |
The smallest VLM ever: 250M parameters |
2025-01-23 |
17 |
DeepSeek R1 |
2025-01-20 |
12 |
Open-source DeepResearch – Freeing our search agents |
2025-02-04 |
6 |
Microsoft Phi 4 with R1 Reasoning |
2025-02-04 |
5 |
Open R1: Update #2 |
2025-02-11 |
5 |
Deepseek VL2 Small |
2025-02-08 |
4 |
Qwen 2.5 Max |
2025-01-28 |
4 |
Hugging Face open sources a web-browsing agent that uses VLMs |
2025-01-24 |
4 |
Deepseek R1 Zero |
2025-01-20 |
3 |
Fine-Tune Deepseek-R1 with a Synthetic Reasoning Dataset |
2025-02-11 |
3 |
Hugging Face AI Agents Course |
2025-02-10 |
3 |
HuggingFace open reproduction of R1 data and training pipeline |
2025-01-27 |
3 |
DeepSeek-R1 on iPhone? (DeepSeek-R1-Distill-Qwen-1.5B-GGUF) |
2025-01-21 |
2 |
OpenAI o3 just scored 99.8% on CodeForces using brute-force |
2025-02-12 |
2 |
FinePersonas |
2025-02-10 |
2 |
#9: Does AI Remember? The Role of Memory in Agentic Workflows |
2025-02-03 |
2 |
Mistral-Small-24B-Base-2501 |
2025-01-30 |
2 |
Generate Images, Chat with PDF in WebGPU via DeepSeek Janus Pro 1B |
2025-01-28 |
2 |
The state of open video generation models |
2025-01-28 |
2 |
Bespoke-Stratos-17k: Open Reasoning Dataset by Distilling DeepSeek-R1 |
2025-01-27 |
2 |
DeepSeek-R1 WebGPU |
2025-01-22 |
1 |
FP8 DeepSeek R1 Distilled LLMs for SGLang and VLLM |
2025-01-29 |
33 |
The Ultra-Scale Playbook: Training LLMs on GPU Clusters |
2025-02-19 |
17 |
Vector Search with DuckDB |
2025-02-26 |
9 |
Show HN: A Transformer model that preserves logical equivalence |
2025-03-02 |
6 |
DeepSeek-R1 without CCP censorship |
2025-02-20 |
6 |
More Efficient Chain-of-Thought Reasoning Through Certainty Probing |
2025-02-18 |
6 |
SigLIP 2: A better multilingual vision language encoder |
2025-02-22 |
4 |
LLaSE-G1 A FOSS speech enhancement model |
2025-03-08 |
4 |
Qwen/QwQ-32B released on Hugging Face |
2025-03-06 |
4 |
Wan2.1-T2V-14B |
2025-02-25 |
4 |
The Curse of Depth in Large Language Models |
2025-02-13 |
3 |
GEN3C: 3D-Informed World-Consistent Video |
2025-03-06 |
3 |
Microsoft Releases Phi-4-multimodal [pdf] |
2025-02-26 |
3 |
WanX open weight sota 14B video model release |
2025-02-25 |
3 |
Step-Audio-Chat: a 132B end-to-end speech-to-speech model |
2025-02-17 |
2 |
FastRTC: The Real-Time Communication Library for Python |
2025-02-25 |
2 |
Show HN: Roast Any Website with AI |
2025-02-25 |
2 |
SWE-Lancer: Can LLMs Earn $1M from Real-World Freelance Software Engineering? |
2025-02-18 |
2 |
Desklib AI Detector Ranks No 1 on Raid Benchmark for AI Detection |
2025-02-17 |
2 |
Forget What You Know about LLMs Evaluations – LLMs Are Like a Chameleon |
2025-02-13 |
63 |
Open-sourcing 5,000hrs of self-driving dataset |
2025-03-11 |
18 |
Deepseek V3-0324 |
2025-03-24 |
13 |
Co-Doodle with Gemini |
2025-03-19 |
12 |
FUTO open-sources 1M row keyboard swipe dataset |
2025-04-04 |
8 |
Sesame CSM-1B: Open-Source Conversational Speech Model |
2025-03-14 |
7 |
Hugging Face datasets and models for cybersecurity/sofwtare vulnerabilities |
2025-03-09 |
6 |
Qwen2.5-Omni Technical Report |
2025-03-30 |
5 |
Gemma 3 QAT (Quantized Aware Training) 3x less memory |
2025-04-03 |
5 |
DocumentAI with 256M Parameters |
2025-03-20 |
4 |
Migrating Hugging Face off Git LFS and to a new storage system (Xet) |
2025-03-18 |
4 |
MoCha: Towards Movie-Grade Talking Character Synthesis |
2025-04-01 |
4 |
Qwen2.5-Omni-7B |
2025-03-26 |
4 |
Open R1's OlympicCoder beats Deepseek R1, models and underlying dataset released |
2025-03-25 |
3 |
Show HN: First large scale evaluation of 4o Image Generation from OpenAI |
2025-03-27 |
3 |
EuroBERT: A High-Performance Multilingual Encoder Model |
2025-03-10 |
3 |
Training LLMs with GRPO and Interpreter Feedback Using WebAssembly |
2025-04-06 |
3 |
AgentRxiv: Towards Collaborative Autonomous Research |
2025-03-25 |
3 |
DeepSeek V3-0324 Posted to HuggingFace |
2025-03-24 |
3 |
Nvidia Isaac GR00T N1 is the first open foundation model for humanoid |
2025-03-21 |
3 |
VACE: All-in-One Video Creation and Editing from Alibaba |
2025-03-12 |
2 |
JFK Assassination Records Dataset on Hugging Face |
2025-04-09 |
2 |
Show HN: My progress towards building a robotics training dataset |
2025-03-18 |
2 |
HOGWILD! Inference – parallel LLM chain-of-thought with shared attention |
2025-04-09 |
2 |
Llama-4 Model-Based Agentic AI System HuggingFace Released |
2025-04-06 |
2 |
Llama 3.2 from-scratch implementation focused on code readability |
2025-04-01 |
2 |
deepsite |
2025-03-31 |
2 |
SuperBPE: Space Travel for Language Models |
2025-03-29 |
2 |
Gemma3 on Hugging Face |
2025-03-26 |
2 |
Open-source LLM beats OpenAI o1 and DeepSeek-R1 for PyTorch-to-Triton codegen |
2025-03-19 |
2 |
Cohere: Command A (111B Open Weights Model) |
2025-03-14 |
2 |
Open Dataset: Vehicle Accidents |
2025-03-13 |
16 |
Qwen3 0.6B now on HuggingFace (quantized) |
2025-04-28 |
14 |
TeapotLLM- an open-source <1B model for hallucination-resistant Q&A on a CPU |
2025-04-16 |
14 |
DeepSeek-Prover-V2-671B |
2025-04-30 |
10 |
Hugging Face to sell open-source robots thanks to Pollen Robotics acquisition |
2025-04-23 |
8 |
DeepSeek-Prover-V2-671B |
2025-04-30 |
5 |
An open source common knowledge and context based Hallucination Detection Model |
2025-04-29 |
5 |
Mixture of Tunable Experts-DeepSeek R1 Behavior Modification at Inference Time |
2025-05-01 |
5 |
CircleGuardBench Leaderboard |
2025-05-07 |
4 |
Devin's First Open Source Model Beats O3 |
2025-05-06 |
4 |
Ltxv-13B – high-quality videos in real-time |
2025-05-07 |
4 |
Show HN: HalluMix – A Benchmark for Real-World LLM Hallucination Detection |
2025-05-06 |
4 |
Higgs – Rapidly Compress LLMs Without Significant Loss of Quality |
2025-04-12 |
3 |
Drape1: Open-Source Scalable adapter for clothing generation |
2025-05-01 |
3 |
GLM-4-32B-0414: New MIT-licensed SOTA LLM from Zhipu AI |
2025-04-15 |
5 |
Show HN: Raman-01 – A Pocket Physics Solver LLM |
2025-05-05 |
3 |
Xiaomi MiMo |
2025-04-30 |
3 |
Qwen3 235B (MoE with 128 experts) |
2025-04-28 |
5 |
An MCP-powered agent in 50 lines of code |
2025-05-15 |
3 |
Dia 1.6B – Nari Text-to-Speech Synthesis |
2025-04-24 |
3 |
Microsoft/MAI-DS-R1, DeepSeek R1 Post-Trained by Microsoft |
2025-04-18 |
2 |
Show HN: TTS Arena V2 |
2025-05-02 |
2 |
WebThinker: Empowering Large Reasoning Models with Deep Research Capability |
2025-05-01 |
2 |
MamayLM: An Efficient Ukrainian LLM |
2025-04-23 |
2 |
Show HN: AEE – An Open-Source Engine That Evaluates Truth and Bias in Text |
2025-04-13 |
2 |
Magi-1: Autoregressive Video Generation at Scale |
2025-05-06 |
2 |
The 4 Things the Qwen-3's Chat Template Teaches Us |
2025-05-02 |
2 |
Show HN: A synthetic text dataset to train tiny language models on |
2025-05-01 |
2 |
Phi-4-Reasoning |
2025-05-01 |
2 |
FantasyTalking: Realistic Talking Portrait Generation |
2025-04-30 |
2 |
Neural Network Visualizer |
2025-04-29 |
2 |
The Bitter Lesson Learned from 2k Multilingual Benchmarks |
2025-04-23 |
2 |
ThinkFlow: The Revolutionary Platform That Gives LLMs the Power to Think |
2025-04-19 |
2 |
Microsoft BitNet 1.58bit LLM 2B4T released |
2025-04-16 |
451 |
Deepseek R1-0528 |
2025-05-28 |
149 |
Show HN: Penny-1.7B Irish Penny Journal style transfer |
2025-06-02 |
52 |
Show HN: ChatToSTL – AI text-to-CAD for 3D printing |
2025-06-12 |
14 |
DeepSeek-R1-0528 performance improvements |
2025-05-29 |
8 |
Model Context Protocol (MCP) Course |
2025-05-21 |
7 |
ByteDance/Dolphin on HuggingFace |
2025-05-19 |
5 |
SWE-rebench: Over 21,000 Open Tasks for SWE LLMs |
2025-05-29 |
5 |
The Common Pile v0.1 |
2025-06-06 |
5 |
You could have designed state of the art positional encoding |
2025-05-20 |
5 |
LLM Embeddings Explained: A Visual and Intuitive Guide |
2025-05-14 |
3 |
Yambda-5B – Industrial-scale music recommendation dataset |
2025-06-04 |
3 |
Show HN: we released an open source, best-in-class medical reasoning model |
2025-05-13 |
3 |
Understanding MCP Evals: Why Evals Matter for MCP |
2025-06-06 |
3 |
Show HN: Ego-Dex Gradio App |
2025-06-03 |
3 |
Hugging Face Courses |
2025-05-27 |
3 |
Show HN: Tinker with Meta's "tokenizer-free" patcher |
2025-05-21 |
3 |
Radiology explainer demo |
2025-05-20 |
3 |
Memelang – a hybrid relational-graph query language |
2025-05-17 |
2 |
SOTA Model in 8B Size? |
2025-05-29 |
2 |
TiRex Leads Gift Eval |
2025-06-02 |
2 |
How do AI political biases differ between English and French? |
2025-05-21 |
2 |
KernelLLM – Meta's new 8B SotA model |
2025-05-19 |
2 |
Wan: Open and Advanced Large-Scale Video Generative Models |
2025-05-14 |
2 |
Embedding Benchmark for Retrieval |
2025-06-11 |
2 |
MiniCPM4 – a series of open multimodal models for edge inference |
2025-06-10 |
2 |
The Qwen3 Embedding Model |
2025-06-06 |
2 |
Tiny Agents in Python: an MCP-powered agent in ~70 lines of code |
2025-05-23 |
2 |
Show HN: 2.4x faster baai/bge-M3 |
2025-05-18 |
2 |
Vision Language Models (Better, Faster, Stronger) |
2025-05-13 |
2 |
Building and better understanding vision-language models (2024) |
2025-05-10 |