Home / Companies / HuggingFace / Hacker News

HuggingFace on HN

215 posts with 10+ points since 2022

Filters
Since:
Posts by Month (215 total)
Hacker News Posts
Title Points Comments Date
DeepSeek-v3.2: Pushing the frontier of open large language models [pdf] 978 -- 2025-12-01
Uncensor any LLM with abliteration 586 -- 2024-06-13
Show HN: Sweep, Open-weights 1.5B model for next-edit autocomplete 530 -- 2026-01-21
Deepseek R1-0528 451 -- 2025-05-28
Llama-3.3-70B-Instruct 425 -- 2024-12-06
Try Stable Diffusion's Img2Img Mode 415 -- 2022-08-29
Open-R1: an open reproduction of DeepSeek-R1 394 -- 2025-01-28
Smollm3: Smol, multilingual, long-context reasoner LLM 388 -- 2025-07-08
GLM-4.7-Flash 371 -- 2026-01-19
Nanonets-OCR-s – OCR model that transforms documents into structured markdown 361 -- 2025-06-16
A Replacement for BERT 348 -- 2024-12-19
MonadGPT – What would have happened if ChatGPT was invented in the … 323 -- 2023-11-24
Apertus 70B: Truly Open - Swiss LLM by ETH, EPFL and CSCS 319 -- 2025-09-02
DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning 263 -- 2025-12-01
The Smol Training Playbook: The Secrets to Building World-Class LLMs 262 -- 2025-10-30
LLM in a Flash: Efficient LLM Inference with Limited Memory 252 -- 2023-12-20
Microsoft Phi-2 model changes licence to MIT 240 -- 2024-01-06
Falcon 180B 238 -- 2023-09-06
OpenLLaMA 13B Released 229 -- 2023-06-18
Kokoro WebGPU: Real-time text-to-speech 100% locally in the browser 227 -- 2025-02-07
Hugging Face Releases Agents 214 -- 2023-05-10
Space secrets leak disclosure 197 -- 2024-06-01
BigCode Project Releases StarCoder: A 15B Code LLM 185 -- 2023-05-04
Best 7B LLM on leaderboards made by an amateur following a medium … 181 -- 2024-01-05
Stability.ai sent a take down request to Runway ML's SD v1.5 citing … 179 -- 2022-10-20
We raised $100M for open and collaborative machine learning 175 -- 2022-05-09
Llama 3 8B is almost as good as Wizard 2 8x22B 168 -- 2024-04-19
SantaCoder: A new 1.1B code model for generation and infilling 168 -- 2022-12-22
Nvidia releases NVLM 1.0 72B open weight model 167 -- 2024-10-02
Qwen3-4B-Thinking-2507 166 -- 2025-08-06
StackLlama: A hands-on guide to train LlaMa with RLHF 165 -- 2023-04-06
Explaining the SDXL Latent Space 163 -- 2024-02-05
BLOOM: The largest open multilingual language model 160 -- 2022-07-12
Show HN: Text-to-video model from scratch (2 brothers, 2 years, 2B params) 156 -- 2026-01-22
Hugging Face and Google partner for AI collaboration 152 -- 2024-01-25
Qwen3-235B-A22B-Thinking-2507 152 -- 2025-07-25
Show HN: Penny-1.7B Irish Penny Journal style transfer 149 -- 2025-06-02
Wordalle – Guess the prompt used to generate a set of images … 137 -- 2022-07-01
Mistral-8x7B-Chat 131 -- 2023-12-10
A CC-By Open-Source TTS Model with Voice Cloning 131 -- 2024-11-04
Qwen-Image-Layered: transparency and layer aware open diffusion model 130 -- 2025-12-19
FineWeb: Decanting the web for the finest text data at scale 127 -- 2024-06-02
Yi-34B-Chat 115 -- 2023-11-24
GPT-3.5 and Wolfram Alpha via LangChain 107 -- 2023-01-18
The Falcon has landed in the Hugging Face ecosystem 105 -- 2023-06-05
HuggingChat: Chat with Open Source Models 103 -- 2024-02-21
Hugging Face and AWS partner to make AI more accessible 102 -- 2023-02-21
HuggingFace Training Cluster as a Service 101 -- 2023-09-05
More than 80 AI models from Qualcomm 95 -- 2024-02-28
Segmind Stable Diffusion – A smaller version of Stable Diffusion XL 95 -- 2023-10-25
LLaMA-Pro-8B 94 -- 2024-01-06
HuggingChat 93 -- 2023-04-25
Yarn-Mistral-7B-128k 88 -- 2023-11-11
Qwen3 30B-A3B 87 -- 2025-07-30
Apple/OpenELM: Efficient Open-Source Family Language Models 82 -- 2024-04-24
Sparse LLM Inference on CPU: 75% fewer parameters 78 -- 2023-10-19
Pokemon GAN 77 -- 2022-02-14
YouTube-Commons: Audio transcripts of 2,063,066 YouTube videos, CC-By license 75 -- 2024-04-18
Switch Transformers C – 2048 experts (1.6T params for 3.1 TB) (2022) 73 -- 2023-11-20
Multimodal Neurons in Pretrained Text-Only Transformers 66 -- 2023-08-04
Show HN: Simply Reading Analog Gauges – GPT4, CogVLM Can't 66 -- 2024-01-22
Voxtral-Mini-3B-2507 – Open source speech understanding model 64 -- 2025-07-15
Open-sourcing 5,000hrs of self-driving dataset 63 -- 2025-03-11
HuggingChat – ChatGPT alternative with open source models 61 -- 2023-12-15
MSFT's WizardLM2 models have been taken down 58 -- 2024-04-16
OpenLLaMA 7B Training Completed to 1T Tokens 58 -- 2023-06-07
Phi-2 57 -- 2023-12-13
Dolphin-2_6-Phi-2 56 -- 2023-12-24
Alibaba releases 72B LLM with 32k context length 55 -- 2023-11-30
LiteLlama-460M-1T has 460M parameters trained with 1T tokens 54 -- 2024-01-07
Qwen Image 54 -- 2025-08-04
Fine-Tuning LLMs to 1.58bit 52 -- 2024-09-18
Train faster static embedding models with sentence transformers 52 -- 2025-01-15
Show HN: ChatToSTL – AI text-to-CAD for 3D printing 52 -- 2025-06-12
LLaMA 3 70B Llamafiles 51 -- 2024-04-19
Janus-Pro: Autoregressive framework unifying multimodal understanding&generation 49 -- 2025-01-27
DeepSeek v3 beats Claude sonnet 3.5 and way cheaper 48 -- 2024-12-26
Improving Parquet Dedupe on Hugging Face Hub 47 -- 2024-10-08
Open LLAMA 13B released, trained on 1T tokens 47 -- 2023-06-19
DALL·E Mini 46 -- 2022-04-11
Open-LLM performances are plateauing 46 -- 2024-06-29
The AI Research Residency Program 46 -- 2022-03-23
4-Bit Quantization and QLoRA 41 -- 2023-05-25
BLOOMChat, a 176B parameter, Multi-lingual, fine tuned chat 40 -- 2023-05-19
What's Going on with the Open LLM Leaderboard? 40 -- 2023-06-23
Kai-Fu Li's Yi-34B uses exactly Llama's architecture except for 2 tensor renamed 39 -- 2023-11-14
DeepSeek-R1-Distill-Qwen-1.5B Surpasses GPT-4o in certain benchmarks 39 -- 2025-01-20
Fully autonomous AI agents should not be developed 38 -- 2025-02-07
Zephyr 7B – Mistral Finetune that responds like ChatGPT 37 -- 2023-10-15
Whisper Jax: Transcribe a 1 hour of audio in under 15 seconds 36 -- 2023-04-22
Qwen3-235B-A22B-Instruct-2507 36 -- 2025-07-21
MistralLite by Amazon Web Services 34 -- 2023-11-01
Mixtral-8x22B on HuggingFace 33 -- 2024-04-10
The Ultra-Scale Playbook: Training LLMs on GPU Clusters 33 -- 2025-02-19
Qwen3-Coder-30B-A3B-Instruct 32 -- 2025-07-31
General OCR Theory: Towards OCR-2.0 via a Unified End-to-End Model 31 -- 2024-09-11
Zephyr 141B, a Mixtral 8x22B fine-tune, is now available in Hugging Chat 30 -- 2024-04-12
OpenFLUX.1 30 -- 2024-10-04
Reachy Mini – The Open-Source Robot for Today's and Tomorrow's AI Builders 30 -- 2025-07-09
Mistral 7B v0.2 29 -- 2024-03-31
Mixture of Experts Explained 29 -- 2023-12-11
TinyLlama at 2T of 3T 29 -- 2023-11-19
Video2Game: Real-Time, Interactive, Realistic Environment from a Single Video 28 -- 2024-04-16
Real-Time Latent Consistency Model 27 -- 2023-10-30
Language Modeling Is Compression 27 -- 2023-09-21
grok-2 on Hugging Face 27 -- 2025-08-23
Llama-3.2-3B-Instruct-uncensored 26 -- 2024-09-27
Pixel Art XL: Stable Diffusion XL for Pixel Art 26 -- 2023-08-03
UC Berkeley's open-source Vicuna LLM chatbot released new improved model weights 26 -- 2023-04-14
Llama can now see and run on your device – welcome Llama … 26 -- 2024-09-25
DeepSeek-v3.1 26 -- 2025-08-21
Llama 1.3B Trained on 200B Tokens for Commercial Use 25 -- 2023-04-28
New Phi-3.5 Models from Microsoft, including new MoE 25 -- 2024-08-20
LLM: Transformer Is Linear 25 -- 2024-05-24
DeepSeek-v3.1-Base 25 -- 2025-08-19
NousResearch/Nous-Hermes-2-Yi-34B 24 -- 2023-12-26
Accelerating Stable Diffusion XL Inference with Jax on Cloud TPU v5e 23 -- 2023-10-03
HuggingFace - Tencent launches Hunyuan Large which outperforms Llama 3.1 405B 23 -- 2024-11-05
Mistral Small 3.2 (24B-Instruct-2506) 23 -- 2025-06-20
DeepSeek-v3.1 23 -- 2025-08-19
Lineage Explorer for open source models – Hugging Face Space 22 -- 2024-01-18
Llama 22B: 13B V2 with 33B attention heads frankensteined on 22 -- 2023-08-18
Show HN: Fineweb-Edu-Fortified dataset: Fineweb-Edu deduped, embeddings included 22 -- 2024-08-14
Mistral-7B-OpenOrca. First 7B model to beat all other models <30B 21 -- 2023-10-02
Würstchen: Fast Diffusion for Image Generation 21 -- 2023-09-13
Llama 3.2 21 -- 2024-09-25
Kyutai 1.6B Streaming TTS 21 -- 2025-07-03
Qwen3 235B beats Claude on some code benchmarks 21 -- 2025-07-21
Code Generation with HuggingFace 20 -- 2022-06-07
Selene Mini: Open-sourced SOTA small language-model-as-a-judge 20 -- 2025-01-29
Ernie-ViLG better anime quality than Stable Diffusion 19 -- 2022-09-01
Fine-tune and deploy open LLMs as containers using AIKit - Part 1 19 -- 2024-06-06
makeMoE: Implement a Sparse Mixture of Experts LLM from Scratch 19 -- 2024-01-23
AMD and: Large Language Models Out-of-the-Box Acceleration with AMD GPU 19 -- 2023-12-13
The smallest VLM ever: 250M parameters 19 -- 2025-01-23
This Pokémon Does Not Exist: Using AI models to create fake cards … 18 -- 2022-03-22
HuggingFace to Replace Git LFS with Xet 18 -- 2024-08-23
GPT-NeoX 18 -- 2022-12-14
Fake Insects: a game where you have to identify AI-generated insects 18 -- 2024-08-17
Mixtral-8x22B-Instruct-v0.1 18 -- 2024-04-17
Stable Diffusion Multiplayer 18 -- 2022-10-30
Encrypted Large Language Models with Homomorphic Encryption 18 -- 2023-08-03
Hermes-2-Pro-Llama-3-8B 18 -- 2024-05-01
Orca 2: Teaching Small Language Models How to Reason 18 -- 2023-11-21
Deepseek V3-0324 18 -- 2025-03-24
Show HN: MiniSearch, a minimalist search engine with integrated browser-based AI 17 -- 2023-10-15
StableLM-2-12B 17 -- 2024-04-08
Gemini vs. GPT-4V: A Preliminary Comparison Through Qualitative Cases 17 -- 2023-12-28
Una-Cybertron-7B 17 -- 2023-12-08
GPT Baker lets you build your own open-source GPTs 17 -- 2023-11-23
Deploy Livebook (Elixir) Notebooks as Apps to Hugging Face Spaces 17 -- 2023-06-15
ChatRWKV 17 -- 2023-03-23
DeepSeek R1 17 -- 2025-01-20
Vector Search with DuckDB 17 -- 2025-02-26
DiffuCoder-7B-CpGRPO: A code generation LLM developed by Apple 17 -- 2025-07-04
NuExtract: A LLM for Structured Extraction 16 -- 2024-06-29
An Analysis of Chinese LLM Censorship and Bias with Qwen 2 Instruct 16 -- 2024-06-09
Phi-3 Weights Released 16 -- 2024-04-23
New medical LLM beats Med-PaLM-2, GPT-4 on MMLU benchmarks 16 -- 2024-07-31
Miqu 70B – possible leak of the mistral-medium LLM 16 -- 2024-01-29
New Stable Diffusion model trained on high quality Art 16 -- 2022-12-11
Qwen3 0.6B now on HuggingFace (quantized) 16 -- 2025-04-28
Ollama can run any GGUF Model on Hugging Face Hub now 15 -- 2024-10-16
Llama-3-70B-Instruct-Gradient-1048k 14 -- 2024-05-04
New finance LLM passed the CFA Level III exam 14 -- 2024-07-31
Airoboros-13B: 98% against GPT-3.5 14 -- 2023-05-22
Run Mistral 7B model using less than 4GB of memory on your … 14 -- 2024-07-23
Stable Diffusion 3 Medium Released 14 -- 2024-06-12
Pre-computed vector embeddings available on HuggingFace 14 -- 2024-01-22
TeapotLLM- an open-source <1B model for hallucination-resistant Q&A on a CPU 14 -- 2025-04-16
DeepSeek-Prover-V2-671B 14 -- 2025-04-30
DeepSeek-R1-0528 performance improvements 14 -- 2025-05-29
Create a GPT3 powered Q&A Chatbot for *any* GitHub repo by posting … 13 -- 2023-02-05
Yi-9B-200K 13 -- 2024-03-17
An Introduction to Vision-Language Modeling 13 -- 2024-05-28
Co-Doodle with Gemini 13 -- 2025-03-19
Attention Sinks in LLMs for endless fluency 12 -- 2023-10-09
FineWeb: 15T tokens of the finest data the web has to offer 12 -- 2024-04-21
Idefics: Open Access 60B multimodal model 12 -- 2023-08-22
Google AI just released Flan-T5 models 12 -- 2022-10-24
Language model can listen while speaking 12 -- 2024-08-07
ML for 3D Course on Hugging Face 12 -- 2024-05-16
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs 12 -- 2024-04-09
Command-R: open weights 35B params / 128k tokens context length model by … 12 -- 2024-03-11
StarCoder2 and The Stack v2: new code LLMs and dataset 12 -- 2024-02-28
Jamba-v0.1: An Apache 2.0 licensed 52B Mamba Transformer hybrid LLM base model 12 -- 2024-03-28
Stable difusion on multiplayer: Internet at it best 12 -- 2022-10-30
Open-source DeepResearch – Freeing our search agents 12 -- 2025-02-04
FUTO open-sources 1M row keyboard swipe dataset 12 -- 2025-04-04
HuggingFace Is Down 11 -- 2024-02-28
30B uncensored OSS model with no guardrails 11 -- 2023-11-07
The Stack: 3 TB of permissively licensed source code in 30 programming … 11 -- 2022-10-31
Experiments with Bitnet 1.5 (Ngmi) 11 -- 2024-03-23
Hierarchical Masked 3D Diffusion Model for Video Outpainting 11 -- 2023-09-06
FalconMamba 7B: The first attention-free and general-purpose pure Mamba model 11 -- 2024-08-13
NPC-Playground, a 3D playground to interact with LLM-powered NPCs 11 -- 2024-06-05
Open LLM Leaderboard 11 -- 2024-01-02
Shallow Feed-Forward Neural Networks as Alternative to Attention in Transformers 11 -- 2023-11-21
smolagents: A simple library to build AI agents 11 -- 2025-01-02
DeepSeek-TNG-R1T2-Chimera 11 -- 2025-07-02
CryptGPT: A Simple Approach to Privacy-Preserving LLMs Using Vigenere Cipher 10 -- 2024-06-15
Whisperfile 10 -- 2024-08-19
Llava Model for Video 10 -- 2024-05-16
Show HN: Encrypted Credit Card Approval Using Homomorphic Encryption 10 -- 2024-01-31
Vector embeddings model for medical literature 10 -- 2024-01-08
From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting 10 -- 2023-09-11
Origin of LLMs: An Evolutionary Tree and Graph for 15K Large Language … 10 -- 2023-07-20
Show HN: Image Filtering App Using Homomorphic Encryption 10 -- 2023-02-23
CMFNet: AI Image Deblurring 10 -- 2022-02-27
Show HN: Downloadable AI Musical Instruments 10 -- 2024-12-10
Phi-4 weights have been released under MIT license 10 -- 2025-01-08
Hugging Face to sell open-source robots thanks to Pollen Robotics acquisition 10 -- 2025-04-23
Open Source 1.7tb Dataset of What AI Crawlers Are Doing 10 -- 2025-07-03
Parquet Content-Defined Chunking 10 -- 2025-09-09
Wan2.2-S2V-14B – audio-driven cinematic video generation model 10 -- 2025-08-26