|
DeepSeek-v3.2: Pushing the frontier of open large language models [pdf]
|
978 |
-- |
2025-12-01 |
|
Uncensor any LLM with abliteration
|
586 |
-- |
2024-06-13 |
|
Show HN: Sweep, Open-weights 1.5B model for next-edit autocomplete
|
530 |
-- |
2026-01-21 |
|
Deepseek R1-0528
|
451 |
-- |
2025-05-28 |
|
Llama-3.3-70B-Instruct
|
425 |
-- |
2024-12-06 |
|
Try Stable Diffusion's Img2Img Mode
|
415 |
-- |
2022-08-29 |
|
Open-R1: an open reproduction of DeepSeek-R1
|
394 |
-- |
2025-01-28 |
|
Smollm3: Smol, multilingual, long-context reasoner LLM
|
388 |
-- |
2025-07-08 |
|
GLM-4.7-Flash
|
371 |
-- |
2026-01-19 |
|
Nanonets-OCR-s – OCR model that transforms documents into structured markdown
|
361 |
-- |
2025-06-16 |
|
A Replacement for BERT
|
348 |
-- |
2024-12-19 |
|
MonadGPT – What would have happened if ChatGPT was invented in the …
|
323 |
-- |
2023-11-24 |
|
Apertus 70B: Truly Open - Swiss LLM by ETH, EPFL and CSCS
|
319 |
-- |
2025-09-02 |
|
DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning
|
263 |
-- |
2025-12-01 |
|
The Smol Training Playbook: The Secrets to Building World-Class LLMs
|
262 |
-- |
2025-10-30 |
|
LLM in a Flash: Efficient LLM Inference with Limited Memory
|
252 |
-- |
2023-12-20 |
|
Microsoft Phi-2 model changes licence to MIT
|
240 |
-- |
2024-01-06 |
|
Falcon 180B
|
238 |
-- |
2023-09-06 |
|
OpenLLaMA 13B Released
|
229 |
-- |
2023-06-18 |
|
Kokoro WebGPU: Real-time text-to-speech 100% locally in the browser
|
227 |
-- |
2025-02-07 |
|
Hugging Face Releases Agents
|
214 |
-- |
2023-05-10 |
|
Space secrets leak disclosure
|
197 |
-- |
2024-06-01 |
|
BigCode Project Releases StarCoder: A 15B Code LLM
|
185 |
-- |
2023-05-04 |
|
Best 7B LLM on leaderboards made by an amateur following a medium …
|
181 |
-- |
2024-01-05 |
|
Stability.ai sent a take down request to Runway ML's SD v1.5 citing …
|
179 |
-- |
2022-10-20 |
|
We raised $100M for open and collaborative machine learning
|
175 |
-- |
2022-05-09 |
|
Llama 3 8B is almost as good as Wizard 2 8x22B
|
168 |
-- |
2024-04-19 |
|
SantaCoder: A new 1.1B code model for generation and infilling
|
168 |
-- |
2022-12-22 |
|
Nvidia releases NVLM 1.0 72B open weight model
|
167 |
-- |
2024-10-02 |
|
Qwen3-4B-Thinking-2507
|
166 |
-- |
2025-08-06 |
|
StackLlama: A hands-on guide to train LlaMa with RLHF
|
165 |
-- |
2023-04-06 |
|
Explaining the SDXL Latent Space
|
163 |
-- |
2024-02-05 |
|
BLOOM: The largest open multilingual language model
|
160 |
-- |
2022-07-12 |
|
Show HN: Text-to-video model from scratch (2 brothers, 2 years, 2B params)
|
156 |
-- |
2026-01-22 |
|
Hugging Face and Google partner for AI collaboration
|
152 |
-- |
2024-01-25 |
|
Qwen3-235B-A22B-Thinking-2507
|
152 |
-- |
2025-07-25 |
|
Show HN: Penny-1.7B Irish Penny Journal style transfer
|
149 |
-- |
2025-06-02 |
|
Wordalle – Guess the prompt used to generate a set of images …
|
137 |
-- |
2022-07-01 |
|
Mistral-8x7B-Chat
|
131 |
-- |
2023-12-10 |
|
A CC-By Open-Source TTS Model with Voice Cloning
|
131 |
-- |
2024-11-04 |
|
Qwen-Image-Layered: transparency and layer aware open diffusion model
|
130 |
-- |
2025-12-19 |
|
FineWeb: Decanting the web for the finest text data at scale
|
127 |
-- |
2024-06-02 |
|
Yi-34B-Chat
|
115 |
-- |
2023-11-24 |
|
GPT-3.5 and Wolfram Alpha via LangChain
|
107 |
-- |
2023-01-18 |
|
The Falcon has landed in the Hugging Face ecosystem
|
105 |
-- |
2023-06-05 |
|
HuggingChat: Chat with Open Source Models
|
103 |
-- |
2024-02-21 |
|
Hugging Face and AWS partner to make AI more accessible
|
102 |
-- |
2023-02-21 |
|
HuggingFace Training Cluster as a Service
|
101 |
-- |
2023-09-05 |
|
More than 80 AI models from Qualcomm
|
95 |
-- |
2024-02-28 |
|
Segmind Stable Diffusion – A smaller version of Stable Diffusion XL
|
95 |
-- |
2023-10-25 |
|
LLaMA-Pro-8B
|
94 |
-- |
2024-01-06 |
|
HuggingChat
|
93 |
-- |
2023-04-25 |
|
Yarn-Mistral-7B-128k
|
88 |
-- |
2023-11-11 |
|
Qwen3 30B-A3B
|
87 |
-- |
2025-07-30 |
|
Apple/OpenELM: Efficient Open-Source Family Language Models
|
82 |
-- |
2024-04-24 |
|
Sparse LLM Inference on CPU: 75% fewer parameters
|
78 |
-- |
2023-10-19 |
|
Pokemon GAN
|
77 |
-- |
2022-02-14 |
|
YouTube-Commons: Audio transcripts of 2,063,066 YouTube videos, CC-By license
|
75 |
-- |
2024-04-18 |
|
Switch Transformers C – 2048 experts (1.6T params for 3.1 TB) (2022)
|
73 |
-- |
2023-11-20 |
|
Multimodal Neurons in Pretrained Text-Only Transformers
|
66 |
-- |
2023-08-04 |
|
Show HN: Simply Reading Analog Gauges – GPT4, CogVLM Can't
|
66 |
-- |
2024-01-22 |
|
Voxtral-Mini-3B-2507 – Open source speech understanding model
|
64 |
-- |
2025-07-15 |
|
Open-sourcing 5,000hrs of self-driving dataset
|
63 |
-- |
2025-03-11 |
|
HuggingChat – ChatGPT alternative with open source models
|
61 |
-- |
2023-12-15 |
|
MSFT's WizardLM2 models have been taken down
|
58 |
-- |
2024-04-16 |
|
OpenLLaMA 7B Training Completed to 1T Tokens
|
58 |
-- |
2023-06-07 |
|
Phi-2
|
57 |
-- |
2023-12-13 |
|
Dolphin-2_6-Phi-2
|
56 |
-- |
2023-12-24 |
|
Alibaba releases 72B LLM with 32k context length
|
55 |
-- |
2023-11-30 |
|
LiteLlama-460M-1T has 460M parameters trained with 1T tokens
|
54 |
-- |
2024-01-07 |
|
Qwen Image
|
54 |
-- |
2025-08-04 |
|
Fine-Tuning LLMs to 1.58bit
|
52 |
-- |
2024-09-18 |
|
Train faster static embedding models with sentence transformers
|
52 |
-- |
2025-01-15 |
|
Show HN: ChatToSTL – AI text-to-CAD for 3D printing
|
52 |
-- |
2025-06-12 |
|
LLaMA 3 70B Llamafiles
|
51 |
-- |
2024-04-19 |
|
Janus-Pro: Autoregressive framework unifying multimodal understanding&generation
|
49 |
-- |
2025-01-27 |
|
DeepSeek v3 beats Claude sonnet 3.5 and way cheaper
|
48 |
-- |
2024-12-26 |
|
Improving Parquet Dedupe on Hugging Face Hub
|
47 |
-- |
2024-10-08 |
|
Open LLAMA 13B released, trained on 1T tokens
|
47 |
-- |
2023-06-19 |
|
DALL·E Mini
|
46 |
-- |
2022-04-11 |
|
Open-LLM performances are plateauing
|
46 |
-- |
2024-06-29 |
|
The AI Research Residency Program
|
46 |
-- |
2022-03-23 |
|
4-Bit Quantization and QLoRA
|
41 |
-- |
2023-05-25 |
|
BLOOMChat, a 176B parameter, Multi-lingual, fine tuned chat
|
40 |
-- |
2023-05-19 |
|
What's Going on with the Open LLM Leaderboard?
|
40 |
-- |
2023-06-23 |
|
Kai-Fu Li's Yi-34B uses exactly Llama's architecture except for 2 tensor renamed
|
39 |
-- |
2023-11-14 |
|
DeepSeek-R1-Distill-Qwen-1.5B Surpasses GPT-4o in certain benchmarks
|
39 |
-- |
2025-01-20 |
|
Fully autonomous AI agents should not be developed
|
38 |
-- |
2025-02-07 |
|
Zephyr 7B – Mistral Finetune that responds like ChatGPT
|
37 |
-- |
2023-10-15 |
|
Whisper Jax: Transcribe a 1 hour of audio in under 15 seconds
|
36 |
-- |
2023-04-22 |
|
Qwen3-235B-A22B-Instruct-2507
|
36 |
-- |
2025-07-21 |
|
MistralLite by Amazon Web Services
|
34 |
-- |
2023-11-01 |
|
Mixtral-8x22B on HuggingFace
|
33 |
-- |
2024-04-10 |
|
The Ultra-Scale Playbook: Training LLMs on GPU Clusters
|
33 |
-- |
2025-02-19 |
|
Qwen3-Coder-30B-A3B-Instruct
|
32 |
-- |
2025-07-31 |
|
General OCR Theory: Towards OCR-2.0 via a Unified End-to-End Model
|
31 |
-- |
2024-09-11 |
|
Zephyr 141B, a Mixtral 8x22B fine-tune, is now available in Hugging Chat
|
30 |
-- |
2024-04-12 |
|
OpenFLUX.1
|
30 |
-- |
2024-10-04 |
|
Reachy Mini – The Open-Source Robot for Today's and Tomorrow's AI Builders
|
30 |
-- |
2025-07-09 |
|
Mistral 7B v0.2
|
29 |
-- |
2024-03-31 |
|
Mixture of Experts Explained
|
29 |
-- |
2023-12-11 |
|
TinyLlama at 2T of 3T
|
29 |
-- |
2023-11-19 |
|
Video2Game: Real-Time, Interactive, Realistic Environment from a Single Video
|
28 |
-- |
2024-04-16 |
|
Real-Time Latent Consistency Model
|
27 |
-- |
2023-10-30 |
|
Language Modeling Is Compression
|
27 |
-- |
2023-09-21 |
|
grok-2 on Hugging Face
|
27 |
-- |
2025-08-23 |
|
Llama-3.2-3B-Instruct-uncensored
|
26 |
-- |
2024-09-27 |
|
Pixel Art XL: Stable Diffusion XL for Pixel Art
|
26 |
-- |
2023-08-03 |
|
UC Berkeley's open-source Vicuna LLM chatbot released new improved model weights
|
26 |
-- |
2023-04-14 |
|
Llama can now see and run on your device – welcome Llama …
|
26 |
-- |
2024-09-25 |
|
DeepSeek-v3.1
|
26 |
-- |
2025-08-21 |
|
Llama 1.3B Trained on 200B Tokens for Commercial Use
|
25 |
-- |
2023-04-28 |
|
New Phi-3.5 Models from Microsoft, including new MoE
|
25 |
-- |
2024-08-20 |
|
LLM: Transformer Is Linear
|
25 |
-- |
2024-05-24 |
|
DeepSeek-v3.1-Base
|
25 |
-- |
2025-08-19 |