|
DeepSeek-v3.2: Pushing the frontier of open large language models [pdf]
|
978 |
-- |
2025-12-01 |
|
Deepseek R1-0528
|
451 |
-- |
2025-05-28 |
|
Open-R1: an open reproduction of DeepSeek-R1
|
394 |
-- |
2025-01-28 |
|
Smollm3: Smol, multilingual, long-context reasoner LLM
|
388 |
-- |
2025-07-08 |
|
Nanonets-OCR-s – OCR model that transforms documents into structured markdown
|
361 |
-- |
2025-06-16 |
|
Apertus 70B: Truly Open - Swiss LLM by ETH, EPFL and CSCS
|
319 |
-- |
2025-09-02 |
|
DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning
|
263 |
-- |
2025-12-01 |
|
The Smol Training Playbook: The Secrets to Building World-Class LLMs
|
262 |
-- |
2025-10-30 |
|
Kokoro WebGPU: Real-time text-to-speech 100% locally in the browser
|
227 |
-- |
2025-02-07 |
|
Qwen3-4B-Thinking-2507
|
166 |
-- |
2025-08-06 |
|
Qwen3-235B-A22B-Thinking-2507
|
152 |
-- |
2025-07-25 |
|
Show HN: Penny-1.7B Irish Penny Journal style transfer
|
149 |
-- |
2025-06-02 |
|
Qwen-Image-Layered: transparency and layer aware open diffusion model
|
130 |
-- |
2025-12-19 |
|
Qwen3 30B-A3B
|
87 |
-- |
2025-07-30 |
|
Voxtral-Mini-3B-2507 – Open source speech understanding model
|
64 |
-- |
2025-07-15 |
|
Open-sourcing 5,000hrs of self-driving dataset
|
63 |
-- |
2025-03-11 |
|
Qwen Image
|
54 |
-- |
2025-08-04 |
|
Train faster static embedding models with sentence transformers
|
52 |
-- |
2025-01-15 |
|
Show HN: ChatToSTL – AI text-to-CAD for 3D printing
|
52 |
-- |
2025-06-12 |
|
Janus-Pro: Autoregressive framework unifying multimodal understanding&generation
|
49 |
-- |
2025-01-27 |
|
DeepSeek-R1-Distill-Qwen-1.5B Surpasses GPT-4o in certain benchmarks
|
39 |
-- |
2025-01-20 |
|
Fully autonomous AI agents should not be developed
|
38 |
-- |
2025-02-07 |
|
Qwen3-235B-A22B-Instruct-2507
|
36 |
-- |
2025-07-21 |
|
The Ultra-Scale Playbook: Training LLMs on GPU Clusters
|
33 |
-- |
2025-02-19 |
|
Qwen3-Coder-30B-A3B-Instruct
|
32 |
-- |
2025-07-31 |
|
Reachy Mini – The Open-Source Robot for Today's and Tomorrow's AI Builders
|
30 |
-- |
2025-07-09 |
|
grok-2 on Hugging Face
|
27 |
-- |
2025-08-23 |
|
DeepSeek-v3.1
|
26 |
-- |
2025-08-21 |
|
DeepSeek-v3.1-Base
|
25 |
-- |
2025-08-19 |
|
Mistral Small 3.2 (24B-Instruct-2506)
|
23 |
-- |
2025-06-20 |
|
DeepSeek-v3.1
|
23 |
-- |
2025-08-19 |
|
Kyutai 1.6B Streaming TTS
|
21 |
-- |
2025-07-03 |
|
Qwen3 235B beats Claude on some code benchmarks
|
21 |
-- |
2025-07-21 |
|
Selene Mini: Open-sourced SOTA small language-model-as-a-judge
|
20 |
-- |
2025-01-29 |
|
The smallest VLM ever: 250M parameters
|
19 |
-- |
2025-01-23 |
|
Deepseek V3-0324
|
18 |
-- |
2025-03-24 |
|
DeepSeek R1
|
17 |
-- |
2025-01-20 |
|
Vector Search with DuckDB
|
17 |
-- |
2025-02-26 |
|
DiffuCoder-7B-CpGRPO: A code generation LLM developed by Apple
|
17 |
-- |
2025-07-04 |
|
Qwen3 0.6B now on HuggingFace (quantized)
|
16 |
-- |
2025-04-28 |
|
TeapotLLM- an open-source <1B model for hallucination-resistant Q&A on a CPU
|
14 |
-- |
2025-04-16 |
|
DeepSeek-Prover-V2-671B
|
14 |
-- |
2025-04-30 |
|
DeepSeek-R1-0528 performance improvements
|
14 |
-- |
2025-05-29 |
|
Co-Doodle with Gemini
|
13 |
-- |
2025-03-19 |
|
Open-source DeepResearch – Freeing our search agents
|
12 |
-- |
2025-02-04 |
|
FUTO open-sources 1M row keyboard swipe dataset
|
12 |
-- |
2025-04-04 |
|
smolagents: A simple library to build AI agents
|
11 |
-- |
2025-01-02 |
|
DeepSeek-TNG-R1T2-Chimera
|
11 |
-- |
2025-07-02 |
|
Phi-4 weights have been released under MIT license
|
10 |
-- |
2025-01-08 |
|
Hugging Face to sell open-source robots thanks to Pollen Robotics acquisition
|
10 |
-- |
2025-04-23 |
|
Open Source 1.7tb Dataset of What AI Crawlers Are Doing
|
10 |
-- |
2025-07-03 |
|
Parquet Content-Defined Chunking
|
10 |
-- |
2025-09-09 |
|
Wan2.2-S2V-14B – audio-driven cinematic video generation model
|
10 |
-- |
2025-08-26 |
|
Show HN: A Transformer model that preserves logical equivalence
|
9 |
-- |
2025-03-02 |
|
Sesame CSM-1B: Open-Source Conversational Speech Model
|
8 |
-- |
2025-03-14 |
|
DeepSeek-Prover-V2-671B
|
8 |
-- |
2025-04-30 |
|
Model Context Protocol (MCP) Course
|
8 |
-- |
2025-05-21 |
|
Tencent's Hunyuan Instruct 7B/4B/1.8B/0.5B new models have been released
|
8 |
-- |
2025-08-04 |
|
MistralAI released a new Magistral Small 2509
|
8 |
-- |
2025-09-17 |
|
Hugging Face datasets and models for cybersecurity/sofwtare vulnerabilities
|
7 |
-- |
2025-03-09 |
|
ByteDance/Dolphin on HuggingFace
|
7 |
-- |
2025-05-19 |
|
Holo1.5: Foundational Models for Computer Use Agents
|
7 |
-- |
2025-09-15 |
|
LFM2 WebGPU
|
7 |
-- |
2025-08-06 |
|
OpenAI/GPT-OSS-120B · Hugging Face
|
7 |
-- |
2025-08-05 |
|
Kokoro-TTS
|
6 |
-- |
2025-01-13 |
|
Microsoft Phi 4 with R1 Reasoning
|
6 |
-- |
2025-02-04 |
|
DeepSeek-R1 without CCP censorship
|
6 |
-- |
2025-02-20 |
|
More Efficient Chain-of-Thought Reasoning Through Certainty Probing
|
6 |
-- |
2025-02-18 |
|
SigLIP 2: A better multilingual vision language encoder
|
6 |
-- |
2025-02-22 |
|
Qwen2.5-Omni Technical Report
|
6 |
-- |
2025-03-30 |
|
Better than DeepSeek R1? MiniMax-M1:open-weight hybrid-attention reasoning model
|
6 |
-- |
2025-06-16 |
|
Show HN: Agent Leaderboard 2.0 – Domain Specific edition
|
6 |
-- |
2025-07-17 |
|
Apple releases FastVLM and MobileCLIP2 on HF, real-time video captioning
|
6 |
-- |
2025-08-30 |
|
Show HN: We built a better reranker and open sourced it
|
6 |
-- |
2025-08-27 |
|
Nvidia STT Parakeet v3
|
6 |
-- |
2025-08-15 |
|
First 70B model released with all training epochs and data
|
6 |
-- |
2025-09-12 |
|
Qwen3-Next series represents our next-generation foundation models
|
6 |
-- |
2025-09-12 |
|
Qwen Image Edit - SOTA Open Weight Image Editing Model
|
6 |
-- |
2025-08-18 |
|
Cybersecurity Instruction Tuned Model
|
6 |
-- |
2025-08-05 |
|
Open R1: Update #2
|
5 |
-- |
2025-02-11 |
|
Deepseek VL2 Small
|
5 |
-- |
2025-02-08 |
|
Gemma 3 QAT (Quantized Aware Training) 3x less memory
|
5 |
-- |
2025-04-03 |
|
DocumentAI with 256M Parameters
|
5 |
-- |
2025-03-20 |
|
An open source common knowledge and context based Hallucination Detection Model
|
5 |
-- |
2025-04-29 |
|
Mixture of Tunable Experts-DeepSeek R1 Behavior Modification at Inference Time
|
5 |
-- |
2025-05-01 |
|
CircleGuardBench Leaderboard
|
5 |
-- |
2025-05-07 |
|
Show HN: Raman-01 – A Pocket Physics Solver LLM
|
5 |
-- |
2025-05-05 |
|
An MCP-powered agent in 50 lines of code
|
5 |
-- |
2025-05-15 |
|
SWE-rebench: Over 21,000 Open Tasks for SWE LLMs
|
5 |
-- |
2025-05-29 |
|
The Common Pile v0.1
|
5 |
-- |
2025-06-06 |
|
You could have designed state of the art positional encoding
|
5 |
-- |
2025-05-20 |
|
LLM Embeddings Explained: A Visual and Intuitive Guide
|
5 |
-- |
2025-05-14 |
|
Show HN: KaniTTS – Open-source high-fidelity TTS with just 450M params
|
5 |
-- |
2025-09-19 |
|
GLM 4.5
|
5 |
-- |
2025-07-28 |
|
Gaia2 and Are: Empowering the Community to Evaluate Agents
|
5 |
-- |
2025-09-22 |
|
VibeVoice: A Frontier Open-Source Text-to-Speech Model
|
5 |
-- |
2025-08-26 |
|
Qwen2.5-Coder-3B Fine-Tuned for Triton Kernel Gen
|
5 |
-- |
2025-08-03 |
|
Qwen 2.5 Max
|
4 |
-- |
2025-01-28 |
|
Hugging Face open sources a web-browsing agent that uses VLMs
|
4 |
-- |
2025-01-24 |
|
Deepseek R1 Zero
|
4 |
-- |
2025-01-20 |
|
LLaSE-G1 A FOSS speech enhancement model
|
4 |
-- |
2025-03-08 |
|
Qwen/QwQ-32B released on Hugging Face
|
4 |
-- |
2025-03-06 |
|
Wan2.1-T2V-14B
|
4 |
-- |
2025-02-25 |
|
The Curse of Depth in Large Language Models
|
4 |
-- |
2025-02-13 |
|
Migrating Hugging Face off Git LFS and to a new storage system …
|
4 |
-- |
2025-03-18 |
|
MoCha: Towards Movie-Grade Talking Character Synthesis
|
4 |
-- |
2025-04-01 |
|
Qwen2.5-Omni-7B
|
4 |
-- |
2025-03-26 |
|
Open R1's OlympicCoder beats Deepseek R1, models and underlying dataset released
|
4 |
-- |
2025-03-25 |
|
Devin's First Open Source Model Beats O3
|
4 |
-- |
2025-05-06 |
|
Ltxv-13B – high-quality videos in real-time
|
4 |
-- |
2025-05-07 |
|
Show HN: HalluMix – A Benchmark for Real-World LLM Hallucination Detection
|
4 |
-- |
2025-05-06 |
|
Higgs – Rapidly Compress LLMs Without Significant Loss of Quality
|
4 |
-- |
2025-04-12 |
|
New virtual try on model family that seems to be SOTA
|
4 |
-- |
2025-06-28 |
|
Gemma 3n available in the open-source ecosystem
|
4 |
-- |
2025-06-26 |
|
Automated Discovery of High-Performance GPU Kernels with OpenEvolve
|
4 |
-- |
2025-06-28 |
|
Jan-Nano-128k: Empowering deeper research through extended context understanding
|
4 |
-- |
2025-06-25 |
|
Kimi-Dev-72B
|
4 |
-- |
2025-07-13 |
|
Kimi K2: 1T total parameter open-source LLM by Moonshot AI
|
4 |
-- |
2025-07-11 |
|
Mistral AI releases Devstral-Small-2507
|
4 |
-- |
2025-07-10 |
|
A 337M RSS feed dataset
|
4 |
-- |
2025-08-26 |
|
Trackio: A new experiment tracking library from Hugging Face
|
4 |
-- |
2025-07-29 |
|
Show HN: Single-agent long-horizon reasoning within one LLM run
|
4 |
-- |
2025-07-23 |
|
Tricks from OpenAI GPT-OSS you can use with transformers
|
4 |
-- |
2025-09-11 |
|
Kimi-K2-Instruct-0905
|
4 |
-- |
2025-09-05 |
|
OmniNeural – First NPU-Aware Multimodal Model
|
4 |
-- |
2025-08-24 |
|
Gemma 3-270M
|
4 |
-- |
2025-08-14 |
|
Pruned expert GPT-OSS 6.6B
|
4 |
-- |
2025-08-13 |
|
UIGEN-X-32B-0727 Reasoning Only UI Generation Model
|
4 |
-- |
2025-07-28 |
|
Timeline of AI model releases in 2024
|
3 |
-- |
2025-01-01 |
|
Fine-Tune Deepseek-R1 with a Synthetic Reasoning Dataset
|
3 |
-- |
2025-02-11 |
|
Hugging Face AI Agents Course
|
3 |
-- |
2025-02-10 |
|
HuggingFace open reproduction of R1 data and training pipeline
|
3 |
-- |
2025-01-27 |
|
DeepSeek-R1 on iPhone? (DeepSeek-R1-Distill-Qwen-1.5B-GGUF)
|
3 |
-- |
2025-01-21 |
|
GEN3C: 3D-Informed World-Consistent Video
|
3 |
-- |
2025-03-06 |
|
Microsoft Releases Phi-4-multimodal [pdf]
|
3 |
-- |
2025-02-26 |
|
WanX open weight sota 14B video model release
|
3 |
-- |
2025-02-25 |
|
Step-Audio-Chat: a 132B end-to-end speech-to-speech model
|
3 |
-- |
2025-02-17 |
|
Show HN: First large scale evaluation of 4o Image Generation from OpenAI
|
3 |
-- |
2025-03-27 |
|
EuroBERT: A High-Performance Multilingual Encoder Model
|
3 |
-- |
2025-03-10 |
|
Training LLMs with GRPO and Interpreter Feedback Using WebAssembly
|
3 |
-- |
2025-04-06 |
|
AgentRxiv: Towards Collaborative Autonomous Research
|
3 |
-- |
2025-03-25 |
|
DeepSeek V3-0324 Posted to HuggingFace
|
3 |
-- |
2025-03-24 |
|
Nvidia Isaac GR00T N1 is the first open foundation model for humanoid
|
3 |
-- |
2025-03-21 |
|
VACE: All-in-One Video Creation and Editing from Alibaba
|
3 |
-- |
2025-03-12 |
|
Drape1: Open-Source Scalable adapter for clothing generation
|
3 |
-- |
2025-05-01 |
|
GLM-4-32B-0414: New MIT-licensed SOTA LLM from Zhipu AI
|
3 |
-- |
2025-04-15 |
|
Xiaomi MiMo
|
3 |
-- |
2025-04-30 |
|
Qwen3 235B (MoE with 128 experts)
|
3 |
-- |
2025-04-28 |
|
Dia 1.6B – Nari Text-to-Speech Synthesis
|
3 |
-- |
2025-04-24 |
|
Microsoft/MAI-DS-R1, DeepSeek R1 Post-Trained by Microsoft
|
3 |
-- |
2025-04-18 |
|
Yambda-5B – Industrial-scale music recommendation dataset
|
3 |
-- |
2025-06-04 |
|
Show HN: we released an open source, best-in-class medical reasoning model
|
3 |
-- |
2025-05-13 |
|
Understanding MCP Evals: Why Evals Matter for MCP
|
3 |
-- |
2025-06-06 |
|
Show HN: Ego-Dex Gradio App
|
3 |
-- |
2025-06-03 |
|
Hugging Face Courses
|
3 |
-- |
2025-05-27 |
|
Show HN: Tinker with Meta's "tokenizer-free" patcher
|
3 |
-- |
2025-05-21 |
|
Radiology explainer demo
|
3 |
-- |
2025-05-20 |
|
Memelang – a hybrid relational-graph query language
|
3 |
-- |
2025-05-17 |
|
Hugging Face Collaborates with Proxima Fusion on ML for Stellarator Optimization
|
3 |
-- |
2025-07-02 |
|
Largest in-person AV conversational dataset ever released
|
3 |
-- |
2025-06-27 |
|
Kimina-Prover: Applying Test-time RL Search on Large Formal Reasoning Models
|
3 |
-- |
2025-07-10 |
|
Show HN: 1.5B LLM routing model that aligns to preferences, not leaderboards
|
3 |
-- |
2025-07-17 |
|
Mistral Releases Voxtral: Open Source Speech Understanding Models (3B and 24B)
|
3 |
-- |
2025-07-15 |
|
CommaCarSegments: 3148 hours of raw CAN bus data from 230 different car …
|
3 |
-- |
2025-07-10 |
|
AnyCoder creates a demo for Qwen Image Edit Plus in 10mins
|
3 |
-- |
2025-09-22 |
|
I made WEBGEN-OSS-20B, a model that generates clean websites from your prompts
|
3 |
-- |
2025-09-13 |
|
Reasoning Traces from QA Pairs
|
3 |
-- |
2025-09-09 |
|
Welcome EmbeddingGemma, Google's new efficient embedding model
|
3 |
-- |
2025-09-04 |
|
Output Schema for CodeAct AI Agents: From Trial-and-Error to Predictive Planning
|
3 |
-- |
2025-08-31 |
|
WildChat-4.8M: 4.8M Real User–ChatGPT Conversations (Open Dataset)
|
3 |
-- |
2025-08-11 |
|
Break the quadratic wall of Transformer attention: WERSA, paper+code open source
|
3 |
-- |
2025-08-02 |
|
Qwen-Image-Edit-2509
|
3 |
-- |
2025-09-22 |
|
AI Spreadsheet Benchmark [pdf]
|
3 |
-- |
2025-09-22 |
|
FinePDFs Dataset
|
3 |
-- |
2025-09-15 |
|
TildeOpen-30B: European LLM Focused on Underrepresented Languages
|
3 |
-- |
2025-09-04 |
|
First vision language model built off Open AI GPT-OSS
|
3 |
-- |
2025-08-26 |
|
Seed-OSS: open-source LLM models by ByteDance
|
3 |
-- |
2025-08-22 |
|
From Zero to GPU: A Guide to Building and Scaling Production-Ready CUDA …
|
3 |
-- |
2025-08-20 |
|
Jan-v1: Advanced Agentic Language Model
|
3 |
-- |
2025-08-12 |
|
NextCoder by Microsoft — LLM performing on par with GPT-4o on complex …
|
3 |
-- |
2025-08-08 |
|
OpenReasoning-Nemotron by Nvidia: state-of-the-art distilled reasoning models
|
3 |
-- |
2025-08-08 |
|
Accelerate ND-Parallel: A Guide to Efficient Multi-GPU Training
|
3 |
-- |
2025-08-08 |
|
HuggingFace on Sheets
|
2 |
-- |
2025-03-24 |
|
Vdr-2B-multi-v1 a multilingual embedding model for visual document retrieval
|
2 |
-- |
2025-01-10 |
|
Show HN: We collected detailed annotations for text-to-image generation
|
2 |
-- |
2025-01-10 |
|
Hugging Face Smolagents
|
2 |
-- |
2025-01-05 |
|
Hugging Face advocates for Code Agents: agents that write tool calls as …
|
2 |
-- |
2025-01-02 |
|
ModernBERT: Encoder-only Transformer Model Strictly Improving on past work
|
2 |
-- |
2025-01-01 |
|
Flex.1-Alpha – A new modded Flux model that can properly handle being …
|
2 |
-- |
2025-01-19 |
|
OpenAI o3 just scored 99.8% on CodeForces using brute-force
|
2 |
-- |
2025-02-12 |
|
FinePersonas
|
2 |
-- |
2025-02-10 |
|
#9: Does AI Remember? The Role of Memory in Agentic Workflows
|
2 |
-- |
2025-02-03 |
|
Mistral-Small-24B-Base-2501
|
2 |
-- |
2025-01-30 |
|
Generate Images, Chat with PDF in WebGPU via DeepSeek Janus Pro 1B
|
2 |
-- |
2025-01-28 |
|
The state of open video generation models
|
2 |
-- |
2025-01-28 |
|
Bespoke-Stratos-17k: Open Reasoning Dataset by Distilling DeepSeek-R1
|
2 |
-- |
2025-01-27 |
|
DeepSeek-R1 WebGPU
|
2 |
-- |
2025-01-22 |
|
FastRTC: The Real-Time Communication Library for Python
|
2 |
-- |
2025-02-25 |
|
Show HN: Roast Any Website with AI
|
2 |
-- |
2025-02-25 |
|
SWE-Lancer: Can LLMs Earn $1M from Real-World Freelance Software Engineering?
|
2 |
-- |
2025-02-18 |
|
Desklib AI Detector Ranks No 1 on Raid Benchmark for AI Detection
|
2 |
-- |
2025-02-17 |
|
Forget What You Know about LLMs Evaluations – LLMs Are Like a …
|
2 |
-- |
2025-02-13 |
|
JFK Assassination Records Dataset on Hugging Face
|
2 |
-- |
2025-04-09 |
|
Show HN: My progress towards building a robotics training dataset
|
2 |
-- |
2025-03-18 |
|
HOGWILD! Inference – parallel LLM chain-of-thought with shared attention
|
2 |
-- |
2025-04-09 |
|
Llama-4 Model-Based Agentic AI System HuggingFace Released
|
2 |
-- |
2025-04-06 |
|
Llama 3.2 from-scratch implementation focused on code readability
|
2 |
-- |
2025-04-01 |
|
deepsite
|
2 |
-- |
2025-03-31 |
|
SuperBPE: Space Travel for Language Models
|
2 |
-- |
2025-03-29 |
|
Gemma3 on Hugging Face
|
2 |
-- |
2025-03-26 |
|
Open-source LLM beats OpenAI o1 and DeepSeek-R1 for PyTorch-to-Triton codegen
|
2 |
-- |
2025-03-19 |
|
Cohere: Command A (111B Open Weights Model)
|
2 |
-- |
2025-03-14 |
|
Open Dataset: Vehicle Accidents
|
2 |
-- |
2025-03-13 |
|
Show HN: TTS Arena V2
|
2 |
-- |
2025-05-02 |
|
WebThinker: Empowering Large Reasoning Models with Deep Research Capability
|
2 |
-- |
2025-05-01 |
|
MamayLM: An Efficient Ukrainian LLM
|
2 |
-- |
2025-04-23 |
|
Show HN: AEE – An Open-Source Engine That Evaluates Truth and Bias …
|
2 |
-- |
2025-04-13 |
|
Magi-1: Autoregressive Video Generation at Scale
|
2 |
-- |
2025-05-06 |
|
The 4 Things the Qwen-3's Chat Template Teaches Us
|
2 |
-- |
2025-05-02 |
|
Show HN: A synthetic text dataset to train tiny language models on
|
2 |
-- |
2025-05-01 |
|
Phi-4-Reasoning
|
2 |
-- |
2025-05-01 |
|
FantasyTalking: Realistic Talking Portrait Generation
|
2 |
-- |
2025-04-30 |
|
Neural Network Visualizer
|
2 |
-- |
2025-04-29 |
|
The Bitter Lesson Learned from 2k Multilingual Benchmarks
|
2 |
-- |
2025-04-23 |
|
ThinkFlow: The Revolutionary Platform That Gives LLMs the Power to Think
|
2 |
-- |
2025-04-19 |
|
Microsoft BitNet 1.58bit LLM 2B4T released
|
2 |
-- |
2025-04-16 |
|
SOTA Model in 8B Size?
|
2 |
-- |
2025-05-29 |
|
TiRex Leads Gift Eval
|
2 |
-- |
2025-06-02 |
|
How do AI political biases differ between English and French?
|
2 |
-- |
2025-05-21 |
|
KernelLLM – Meta's new 8B SotA model
|
2 |
-- |
2025-05-19 |
|
Wan: Open and Advanced Large-Scale Video Generative Models
|
2 |
-- |
2025-05-14 |
|
Embedding Benchmark for Retrieval
|
2 |
-- |
2025-06-11 |
|
MiniCPM4 – a series of open multimodal models for edge inference
|
2 |
-- |
2025-06-10 |
|
The Qwen3 Embedding Model
|
2 |
-- |
2025-06-06 |
|
Tiny Agents in Python: an MCP-powered agent in ~70 lines of code
|
2 |
-- |
2025-05-23 |
|
Show HN: 2.4x faster baai/bge-M3
|
2 |
-- |
2025-05-18 |
|
Vision Language Models (Better, Faster, Stronger)
|
2 |
-- |
2025-05-13 |
|
Building and better understanding vision-language models (2024)
|
2 |
-- |
2025-05-10 |
|
FLUX Kontext Dev Ultra Fast Live
|
2 |
-- |
2025-06-26 |
|
Veena – open-source TTS for Indian Languages
|
2 |
-- |
2025-06-25 |
|
Metalorian: Generate Heavy Metal-Binding Peptides with Diffusion Sampling
|
2 |
-- |
2025-07-12 |
|
Kimi-K2-Base
|
2 |
-- |
2025-07-11 |
|
Building the Hugging Face MCP Server
|
2 |
-- |
2025-07-10 |
|
A Survey on Latent Reasoning
|
2 |
-- |
2025-07-10 |
|
Skywork-R1V3-38B open-source multimodal reasoning model
|
2 |
-- |
2025-07-08 |
|
HuggingChat is shutting down (for now)
|
2 |
-- |
2025-07-04 |
|
Qwen3Guard: Real-Time Safety for Your Token Stream
|
2 |
-- |
2025-09-24 |
|
K2-Think: A Parameter-Efficient Reasoning System
|
2 |
-- |
2025-09-13 |
|
Environments Hub: Your Language Model needs better (open) environments to learn
|
2 |
-- |
2025-09-05 |
|
Pivotal Token Search (PTS): Targeting Critical Decision Points in LLM Training
|
2 |
-- |
2025-08-18 |
|
Voxtral WebGPU
|
2 |
-- |
2025-07-25 |
|
Show HN: kulyk-uk-en and kulyk-en-uk
|
2 |
-- |
2025-07-22 |
|
Show HN: KaniTTS – Ultra Fast and Expressive TTS Model
|
2 |
-- |
2025-09-22 |
|
N-Atlas V1
|
2 |
-- |
2025-09-21 |
|
Granite docling 258M: a small multimodal model for efficient document conversion
|
2 |
-- |
2025-09-17 |
|
Statistical Methods in Generative AI
|
2 |
-- |
2025-09-16 |
|
EmbeddingGemma is a 300M parameter, open embedding model from Google
|
2 |
-- |
2025-09-05 |
|
Swiss AI Initiative
|
2 |
-- |
2025-09-02 |
|
Apertus LLM
|
2 |
-- |
2025-09-02 |
|
Hugging Face speadsheet tool: AI Sheets
|
2 |
-- |
2025-09-01 |
|
A Novel Pretrained Tokenizer-Free LLM Architecture
|
2 |
-- |
2025-08-29 |
|
MiniCPM-V 4.5: GPT-4o Level MLLM for Image and Video Understanding on Your …
|
2 |
-- |
2025-08-26 |
|
NASA and IBM release open source model on Hugging Face to predict …
|
2 |
-- |
2025-08-20 |
|
Tokenizers
|
2 |
-- |
2025-08-17 |
|
FormulaOne: A reasoning benchmark that all models score 0% on
|
2 |
-- |
2025-08-14 |
|
dots.ocr: Multilingual Document Layout Parsing in a Single Vision-Language Model
|
2 |
-- |
2025-08-06 |
|
Qwen3-30B-A3B-Thinking-2507 has been released
|
2 |
-- |
2025-07-31 |
|
Intern-S1: A 241B parameter open-source MoE multimodal model
|
2 |
-- |
2025-07-28 |
|
Creating custom kernels for the AMD MI300
|
2 |
-- |
2025-07-25 |
|
Fast LoRA Inference for Flux with Diffusers and PEFT
|
2 |
-- |
2025-07-24 |
|
Nvidia parakeet-tdt-0.6B-v2
|
2 |
-- |
2025-07-22 |
|
How to Run a Hugging Face Model in Jax (Part 1)
|
2 |
-- |
2025-07-20 |
|
Show HN: Chimera-QxD-BMM-Qwen2-l22_28-alphaqd-1.5B-f16
|
2 |
-- |
2025-07-19 |
|
Show HN: An Agentic AI dataset for deepfake detection
|
1 |
-- |
2025-01-15 |
|
FP8 DeepSeek R1 Distilled LLMs for SGLang and VLLM
|
1 |
-- |
2025-01-29 |