HuggingFace Hacker News

Filters

Min points: 1 10 25 50 100 250 500

Year:

Posts by Month (275 total)

Hacker News Posts

Search:

Title	Points	Comments	Date
DeepSeek-v3.2: Pushing the frontier of open large language models [pdf]	978	--	2025-12-01
Deepseek R1-0528	451	--	2025-05-28
Open-R1: an open reproduction of DeepSeek-R1	394	--	2025-01-28
Smollm3: Smol, multilingual, long-context reasoner LLM	388	--	2025-07-08
Nanonets-OCR-s – OCR model that transforms documents into structured markdown	361	--	2025-06-16
Apertus 70B: Truly Open - Swiss LLM by ETH, EPFL and CSCS	319	--	2025-09-02
DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning	263	--	2025-12-01
The Smol Training Playbook: The Secrets to Building World-Class LLMs	262	--	2025-10-30
Kokoro WebGPU: Real-time text-to-speech 100% locally in the browser	227	--	2025-02-07
Qwen3-4B-Thinking-2507	166	--	2025-08-06
Qwen3-235B-A22B-Thinking-2507	152	--	2025-07-25
Show HN: Penny-1.7B Irish Penny Journal style transfer	149	--	2025-06-02
Qwen-Image-Layered: transparency and layer aware open diffusion model	130	--	2025-12-19
Qwen3 30B-A3B	87	--	2025-07-30
Voxtral-Mini-3B-2507 – Open source speech understanding model	64	--	2025-07-15
Open-sourcing 5,000hrs of self-driving dataset	63	--	2025-03-11
Qwen Image	54	--	2025-08-04
Train faster static embedding models with sentence transformers	52	--	2025-01-15
Show HN: ChatToSTL – AI text-to-CAD for 3D printing	52	--	2025-06-12
Janus-Pro: Autoregressive framework unifying multimodal understanding&generation	49	--	2025-01-27
DeepSeek-R1-Distill-Qwen-1.5B Surpasses GPT-4o in certain benchmarks	39	--	2025-01-20
Fully autonomous AI agents should not be developed	38	--	2025-02-07
Qwen3-235B-A22B-Instruct-2507	36	--	2025-07-21
The Ultra-Scale Playbook: Training LLMs on GPU Clusters	33	--	2025-02-19
Qwen3-Coder-30B-A3B-Instruct	32	--	2025-07-31
Reachy Mini – The Open-Source Robot for Today's and Tomorrow's AI Builders	30	--	2025-07-09
grok-2 on Hugging Face	27	--	2025-08-23
DeepSeek-v3.1	26	--	2025-08-21
DeepSeek-v3.1-Base	25	--	2025-08-19
Mistral Small 3.2 (24B-Instruct-2506)	23	--	2025-06-20
DeepSeek-v3.1	23	--	2025-08-19
Kyutai 1.6B Streaming TTS	21	--	2025-07-03
Qwen3 235B beats Claude on some code benchmarks	21	--	2025-07-21
Selene Mini: Open-sourced SOTA small language-model-as-a-judge	20	--	2025-01-29
The smallest VLM ever: 250M parameters	19	--	2025-01-23
Deepseek V3-0324	18	--	2025-03-24
DeepSeek R1	17	--	2025-01-20
Vector Search with DuckDB	17	--	2025-02-26
DiffuCoder-7B-CpGRPO: A code generation LLM developed by Apple	17	--	2025-07-04
Qwen3 0.6B now on HuggingFace (quantized)	16	--	2025-04-28
TeapotLLM- an open-source <1B model for hallucination-resistant Q&A on a CPU	14	--	2025-04-16
DeepSeek-Prover-V2-671B	14	--	2025-04-30
DeepSeek-R1-0528 performance improvements	14	--	2025-05-29
Co-Doodle with Gemini	13	--	2025-03-19
Open-source DeepResearch – Freeing our search agents	12	--	2025-02-04
FUTO open-sources 1M row keyboard swipe dataset	12	--	2025-04-04
smolagents: A simple library to build AI agents	11	--	2025-01-02
DeepSeek-TNG-R1T2-Chimera	11	--	2025-07-02
Phi-4 weights have been released under MIT license	10	--	2025-01-08
Hugging Face to sell open-source robots thanks to Pollen Robotics acquisition	10	--	2025-04-23
Open Source 1.7tb Dataset of What AI Crawlers Are Doing	10	--	2025-07-03
Parquet Content-Defined Chunking	10	--	2025-09-09
Wan2.2-S2V-14B – audio-driven cinematic video generation model	10	--	2025-08-26
Show HN: A Transformer model that preserves logical equivalence	9	--	2025-03-02
Sesame CSM-1B: Open-Source Conversational Speech Model	8	--	2025-03-14
DeepSeek-Prover-V2-671B	8	--	2025-04-30
Model Context Protocol (MCP) Course	8	--	2025-05-21
Tencent's Hunyuan Instruct 7B/4B/1.8B/0.5B new models have been released	8	--	2025-08-04
MistralAI released a new Magistral Small 2509	8	--	2025-09-17
Hugging Face datasets and models for cybersecurity/sofwtare vulnerabilities	7	--	2025-03-09
ByteDance/Dolphin on HuggingFace	7	--	2025-05-19
Holo1.5: Foundational Models for Computer Use Agents	7	--	2025-09-15
LFM2 WebGPU	7	--	2025-08-06
OpenAI/GPT-OSS-120B · Hugging Face	7	--	2025-08-05
Kokoro-TTS	6	--	2025-01-13
Microsoft Phi 4 with R1 Reasoning	6	--	2025-02-04
DeepSeek-R1 without CCP censorship	6	--	2025-02-20
More Efficient Chain-of-Thought Reasoning Through Certainty Probing	6	--	2025-02-18
SigLIP 2: A better multilingual vision language encoder	6	--	2025-02-22
Qwen2.5-Omni Technical Report	6	--	2025-03-30
Better than DeepSeek R1? MiniMax-M1:open-weight hybrid-attention reasoning model	6	--	2025-06-16
Show HN: Agent Leaderboard 2.0 – Domain Specific edition	6	--	2025-07-17
Apple releases FastVLM and MobileCLIP2 on HF, real-time video captioning	6	--	2025-08-30
Show HN: We built a better reranker and open sourced it	6	--	2025-08-27
Nvidia STT Parakeet v3	6	--	2025-08-15
First 70B model released with all training epochs and data	6	--	2025-09-12
Qwen3-Next series represents our next-generation foundation models	6	--	2025-09-12
Qwen Image Edit - SOTA Open Weight Image Editing Model	6	--	2025-08-18
Cybersecurity Instruction Tuned Model	6	--	2025-08-05
Open R1: Update #2	5	--	2025-02-11
Deepseek VL2 Small	5	--	2025-02-08
Gemma 3 QAT (Quantized Aware Training) 3x less memory	5	--	2025-04-03
DocumentAI with 256M Parameters	5	--	2025-03-20
An open source common knowledge and context based Hallucination Detection Model	5	--	2025-04-29
Mixture of Tunable Experts-DeepSeek R1 Behavior Modification at Inference Time	5	--	2025-05-01
CircleGuardBench Leaderboard	5	--	2025-05-07
Show HN: Raman-01 – A Pocket Physics Solver LLM	5	--	2025-05-05
An MCP-powered agent in 50 lines of code	5	--	2025-05-15
SWE-rebench: Over 21,000 Open Tasks for SWE LLMs	5	--	2025-05-29
The Common Pile v0.1	5	--	2025-06-06
You could have designed state of the art positional encoding	5	--	2025-05-20
LLM Embeddings Explained: A Visual and Intuitive Guide	5	--	2025-05-14
Show HN: KaniTTS – Open-source high-fidelity TTS with just 450M params	5	--	2025-09-19
GLM 4.5	5	--	2025-07-28
Gaia2 and Are: Empowering the Community to Evaluate Agents	5	--	2025-09-22
VibeVoice: A Frontier Open-Source Text-to-Speech Model	5	--	2025-08-26
Qwen2.5-Coder-3B Fine-Tuned for Triton Kernel Gen	5	--	2025-08-03
Qwen 2.5 Max	4	--	2025-01-28
Hugging Face open sources a web-browsing agent that uses VLMs	4	--	2025-01-24
Deepseek R1 Zero	4	--	2025-01-20
LLaSE-G1 A FOSS speech enhancement model	4	--	2025-03-08
Qwen/QwQ-32B released on Hugging Face	4	--	2025-03-06
Wan2.1-T2V-14B	4	--	2025-02-25
The Curse of Depth in Large Language Models	4	--	2025-02-13
Migrating Hugging Face off Git LFS and to a new storage system …	4	--	2025-03-18
MoCha: Towards Movie-Grade Talking Character Synthesis	4	--	2025-04-01
Qwen2.5-Omni-7B	4	--	2025-03-26
Open R1's OlympicCoder beats Deepseek R1, models and underlying dataset released	4	--	2025-03-25
Devin's First Open Source Model Beats O3	4	--	2025-05-06
Ltxv-13B – high-quality videos in real-time	4	--	2025-05-07
Show HN: HalluMix – A Benchmark for Real-World LLM Hallucination Detection	4	--	2025-05-06
Higgs – Rapidly Compress LLMs Without Significant Loss of Quality	4	--	2025-04-12
New virtual try on model family that seems to be SOTA	4	--	2025-06-28
Gemma 3n available in the open-source ecosystem	4	--	2025-06-26
Automated Discovery of High-Performance GPU Kernels with OpenEvolve	4	--	2025-06-28
Jan-Nano-128k: Empowering deeper research through extended context understanding	4	--	2025-06-25
Kimi-Dev-72B	4	--	2025-07-13
Kimi K2: 1T total parameter open-source LLM by Moonshot AI	4	--	2025-07-11
Mistral AI releases Devstral-Small-2507	4	--	2025-07-10
A 337M RSS feed dataset	4	--	2025-08-26
Trackio: A new experiment tracking library from Hugging Face	4	--	2025-07-29
Show HN: Single-agent long-horizon reasoning within one LLM run	4	--	2025-07-23
Tricks from OpenAI GPT-OSS you can use with transformers	4	--	2025-09-11
Kimi-K2-Instruct-0905	4	--	2025-09-05
OmniNeural – First NPU-Aware Multimodal Model	4	--	2025-08-24
Gemma 3-270M	4	--	2025-08-14
Pruned expert GPT-OSS 6.6B	4	--	2025-08-13
UIGEN-X-32B-0727 Reasoning Only UI Generation Model	4	--	2025-07-28
Timeline of AI model releases in 2024	3	--	2025-01-01
Fine-Tune Deepseek-R1 with a Synthetic Reasoning Dataset	3	--	2025-02-11
Hugging Face AI Agents Course	3	--	2025-02-10
HuggingFace open reproduction of R1 data and training pipeline	3	--	2025-01-27
DeepSeek-R1 on iPhone? (DeepSeek-R1-Distill-Qwen-1.5B-GGUF)	3	--	2025-01-21
GEN3C: 3D-Informed World-Consistent Video	3	--	2025-03-06
Microsoft Releases Phi-4-multimodal [pdf]	3	--	2025-02-26
WanX open weight sota 14B video model release	3	--	2025-02-25
Step-Audio-Chat: a 132B end-to-end speech-to-speech model	3	--	2025-02-17
Show HN: First large scale evaluation of 4o Image Generation from OpenAI	3	--	2025-03-27
EuroBERT: A High-Performance Multilingual Encoder Model	3	--	2025-03-10
Training LLMs with GRPO and Interpreter Feedback Using WebAssembly	3	--	2025-04-06
AgentRxiv: Towards Collaborative Autonomous Research	3	--	2025-03-25
DeepSeek V3-0324 Posted to HuggingFace	3	--	2025-03-24
Nvidia Isaac GR00T N1 is the first open foundation model for humanoid	3	--	2025-03-21
VACE: All-in-One Video Creation and Editing from Alibaba	3	--	2025-03-12
Drape1: Open-Source Scalable adapter for clothing generation	3	--	2025-05-01
GLM-4-32B-0414: New MIT-licensed SOTA LLM from Zhipu AI	3	--	2025-04-15
Xiaomi MiMo	3	--	2025-04-30
Qwen3 235B (MoE with 128 experts)	3	--	2025-04-28
Dia 1.6B – Nari Text-to-Speech Synthesis	3	--	2025-04-24
Microsoft/MAI-DS-R1, DeepSeek R1 Post-Trained by Microsoft	3	--	2025-04-18
Yambda-5B – Industrial-scale music recommendation dataset	3	--	2025-06-04
Show HN: we released an open source, best-in-class medical reasoning model	3	--	2025-05-13
Understanding MCP Evals: Why Evals Matter for MCP	3	--	2025-06-06
Show HN: Ego-Dex Gradio App	3	--	2025-06-03
Hugging Face Courses	3	--	2025-05-27
Show HN: Tinker with Meta's "tokenizer-free" patcher	3	--	2025-05-21
Radiology explainer demo	3	--	2025-05-20
Memelang – a hybrid relational-graph query language	3	--	2025-05-17
Hugging Face Collaborates with Proxima Fusion on ML for Stellarator Optimization	3	--	2025-07-02
Largest in-person AV conversational dataset ever released	3	--	2025-06-27
Kimina-Prover: Applying Test-time RL Search on Large Formal Reasoning Models	3	--	2025-07-10
Show HN: 1.5B LLM routing model that aligns to preferences, not leaderboards	3	--	2025-07-17
Mistral Releases Voxtral: Open Source Speech Understanding Models (3B and 24B)	3	--	2025-07-15
CommaCarSegments: 3148 hours of raw CAN bus data from 230 different car …	3	--	2025-07-10
AnyCoder creates a demo for Qwen Image Edit Plus in 10mins	3	--	2025-09-22
I made WEBGEN-OSS-20B, a model that generates clean websites from your prompts	3	--	2025-09-13
Reasoning Traces from QA Pairs	3	--	2025-09-09
Welcome EmbeddingGemma, Google's new efficient embedding model	3	--	2025-09-04
Output Schema for CodeAct AI Agents: From Trial-and-Error to Predictive Planning	3	--	2025-08-31
WildChat-4.8M: 4.8M Real User–ChatGPT Conversations (Open Dataset)	3	--	2025-08-11
Break the quadratic wall of Transformer attention: WERSA, paper+code open source	3	--	2025-08-02
Qwen-Image-Edit-2509	3	--	2025-09-22
AI Spreadsheet Benchmark [pdf]	3	--	2025-09-22
FinePDFs Dataset	3	--	2025-09-15
TildeOpen-30B: European LLM Focused on Underrepresented Languages	3	--	2025-09-04
First vision language model built off Open AI GPT-OSS	3	--	2025-08-26
Seed-OSS: open-source LLM models by ByteDance	3	--	2025-08-22
From Zero to GPU: A Guide to Building and Scaling Production-Ready CUDA …	3	--	2025-08-20
Jan-v1: Advanced Agentic Language Model	3	--	2025-08-12
NextCoder by Microsoft — LLM performing on par with GPT-4o on complex …	3	--	2025-08-08
OpenReasoning-Nemotron by Nvidia: state-of-the-art distilled reasoning models	3	--	2025-08-08
Accelerate ND-Parallel: A Guide to Efficient Multi-GPU Training	3	--	2025-08-08
HuggingFace on Sheets	2	--	2025-03-24
Vdr-2B-multi-v1 a multilingual embedding model for visual document retrieval	2	--	2025-01-10
Show HN: We collected detailed annotations for text-to-image generation	2	--	2025-01-10
Hugging Face Smolagents	2	--	2025-01-05
Hugging Face advocates for Code Agents: agents that write tool calls as …	2	--	2025-01-02
ModernBERT: Encoder-only Transformer Model Strictly Improving on past work	2	--	2025-01-01
Flex.1-Alpha – A new modded Flux model that can properly handle being …	2	--	2025-01-19
OpenAI o3 just scored 99.8% on CodeForces using brute-force	2	--	2025-02-12
FinePersonas	2	--	2025-02-10
#9: Does AI Remember? The Role of Memory in Agentic Workflows	2	--	2025-02-03
Mistral-Small-24B-Base-2501	2	--	2025-01-30
Generate Images, Chat with PDF in WebGPU via DeepSeek Janus Pro 1B	2	--	2025-01-28
The state of open video generation models	2	--	2025-01-28
Bespoke-Stratos-17k: Open Reasoning Dataset by Distilling DeepSeek-R1	2	--	2025-01-27
DeepSeek-R1 WebGPU	2	--	2025-01-22
FastRTC: The Real-Time Communication Library for Python	2	--	2025-02-25
Show HN: Roast Any Website with AI	2	--	2025-02-25
SWE-Lancer: Can LLMs Earn $1M from Real-World Freelance Software Engineering?	2	--	2025-02-18
Desklib AI Detector Ranks No 1 on Raid Benchmark for AI Detection	2	--	2025-02-17
Forget What You Know about LLMs Evaluations – LLMs Are Like a …	2	--	2025-02-13
JFK Assassination Records Dataset on Hugging Face	2	--	2025-04-09
Show HN: My progress towards building a robotics training dataset	2	--	2025-03-18
HOGWILD! Inference – parallel LLM chain-of-thought with shared attention	2	--	2025-04-09
Llama-4 Model-Based Agentic AI System HuggingFace Released	2	--	2025-04-06
Llama 3.2 from-scratch implementation focused on code readability	2	--	2025-04-01
deepsite	2	--	2025-03-31
SuperBPE: Space Travel for Language Models	2	--	2025-03-29
Gemma3 on Hugging Face	2	--	2025-03-26
Open-source LLM beats OpenAI o1 and DeepSeek-R1 for PyTorch-to-Triton codegen	2	--	2025-03-19
Cohere: Command A (111B Open Weights Model)	2	--	2025-03-14
Open Dataset: Vehicle Accidents	2	--	2025-03-13
Show HN: TTS Arena V2	2	--	2025-05-02
WebThinker: Empowering Large Reasoning Models with Deep Research Capability	2	--	2025-05-01
MamayLM: An Efficient Ukrainian LLM	2	--	2025-04-23
Show HN: AEE – An Open-Source Engine That Evaluates Truth and Bias …	2	--	2025-04-13
Magi-1: Autoregressive Video Generation at Scale	2	--	2025-05-06
The 4 Things the Qwen-3's Chat Template Teaches Us	2	--	2025-05-02
Show HN: A synthetic text dataset to train tiny language models on	2	--	2025-05-01
Phi-4-Reasoning	2	--	2025-05-01
FantasyTalking: Realistic Talking Portrait Generation	2	--	2025-04-30
Neural Network Visualizer	2	--	2025-04-29
The Bitter Lesson Learned from 2k Multilingual Benchmarks	2	--	2025-04-23
ThinkFlow: The Revolutionary Platform That Gives LLMs the Power to Think	2	--	2025-04-19
Microsoft BitNet 1.58bit LLM 2B4T released	2	--	2025-04-16
SOTA Model in 8B Size?	2	--	2025-05-29
TiRex Leads Gift Eval	2	--	2025-06-02
How do AI political biases differ between English and French?	2	--	2025-05-21
KernelLLM – Meta's new 8B SotA model	2	--	2025-05-19
Wan: Open and Advanced Large-Scale Video Generative Models	2	--	2025-05-14
Embedding Benchmark for Retrieval	2	--	2025-06-11
MiniCPM4 – a series of open multimodal models for edge inference	2	--	2025-06-10
The Qwen3 Embedding Model	2	--	2025-06-06
Tiny Agents in Python: an MCP-powered agent in ~70 lines of code	2	--	2025-05-23
Show HN: 2.4x faster baai/bge-M3	2	--	2025-05-18
Vision Language Models (Better, Faster, Stronger)	2	--	2025-05-13
Building and better understanding vision-language models (2024)	2	--	2025-05-10
FLUX Kontext Dev Ultra Fast Live	2	--	2025-06-26
Veena – open-source TTS for Indian Languages	2	--	2025-06-25
Metalorian: Generate Heavy Metal-Binding Peptides with Diffusion Sampling	2	--	2025-07-12
Kimi-K2-Base	2	--	2025-07-11
Building the Hugging Face MCP Server	2	--	2025-07-10
A Survey on Latent Reasoning	2	--	2025-07-10
Skywork-R1V3-38B open-source multimodal reasoning model	2	--	2025-07-08
HuggingChat is shutting down (for now)	2	--	2025-07-04
Qwen3Guard: Real-Time Safety for Your Token Stream	2	--	2025-09-24
K2-Think: A Parameter-Efficient Reasoning System	2	--	2025-09-13
Environments Hub: Your Language Model needs better (open) environments to learn	2	--	2025-09-05
Pivotal Token Search (PTS): Targeting Critical Decision Points in LLM Training	2	--	2025-08-18
Voxtral WebGPU	2	--	2025-07-25
Show HN: kulyk-uk-en and kulyk-en-uk	2	--	2025-07-22
Show HN: KaniTTS – Ultra Fast and Expressive TTS Model	2	--	2025-09-22
N-Atlas V1	2	--	2025-09-21
Granite docling 258M: a small multimodal model for efficient document conversion	2	--	2025-09-17
Statistical Methods in Generative AI	2	--	2025-09-16
EmbeddingGemma is a 300M parameter, open embedding model from Google	2	--	2025-09-05
Swiss AI Initiative	2	--	2025-09-02
Apertus LLM	2	--	2025-09-02
Hugging Face speadsheet tool: AI Sheets	2	--	2025-09-01
A Novel Pretrained Tokenizer-Free LLM Architecture	2	--	2025-08-29
MiniCPM-V 4.5: GPT-4o Level MLLM for Image and Video Understanding on Your …	2	--	2025-08-26
NASA and IBM release open source model on Hugging Face to predict …	2	--	2025-08-20
Tokenizers	2	--	2025-08-17
FormulaOne: A reasoning benchmark that all models score 0% on	2	--	2025-08-14
dots.ocr: Multilingual Document Layout Parsing in a Single Vision-Language Model	2	--	2025-08-06
Qwen3-30B-A3B-Thinking-2507 has been released	2	--	2025-07-31
Intern-S1: A 241B parameter open-source MoE multimodal model	2	--	2025-07-28
Creating custom kernels for the AMD MI300	2	--	2025-07-25
Fast LoRA Inference for Flux with Diffusers and PEFT	2	--	2025-07-24
Nvidia parakeet-tdt-0.6B-v2	2	--	2025-07-22
How to Run a Hugging Face Model in Jax (Part 1)	2	--	2025-07-20
Show HN: Chimera-QxD-BMM-Qwen2-l22_28-alphaqd-1.5B-f16	2	--	2025-07-19
Show HN: An Agentic AI dataset for deepfake detection	1	--	2025-01-15
FP8 DeepSeek R1 Distilled LLMs for SGLang and VLLM	1	--	2025-01-29

Plushcap, by Matt Makai. 2021-2026.

HuggingFace on HN