|
DeepSeek-v3.2: Pushing the frontier of open large language models [pdf]
|
978 |
-- |
2025-12-01 |
|
Uncensor any LLM with abliteration
|
586 |
-- |
2024-06-13 |
|
Show HN: Sweep, Open-weights 1.5B model for next-edit autocomplete
|
530 |
-- |
2026-01-21 |
|
Deepseek R1-0528
|
451 |
-- |
2025-05-28 |
|
Llama-3.3-70B-Instruct
|
425 |
-- |
2024-12-06 |
|
Try Stable Diffusion's Img2Img Mode
|
415 |
-- |
2022-08-29 |
|
Open-R1: an open reproduction of DeepSeek-R1
|
394 |
-- |
2025-01-28 |
|
Smollm3: Smol, multilingual, long-context reasoner LLM
|
388 |
-- |
2025-07-08 |
|
GLM-4.7-Flash
|
371 |
-- |
2026-01-19 |
|
Nanonets-OCR-s – OCR model that transforms documents into structured markdown
|
361 |
-- |
2025-06-16 |
|
A Replacement for BERT
|
348 |
-- |
2024-12-19 |
|
MonadGPT – What would have happened if ChatGPT was invented in the …
|
323 |
-- |
2023-11-24 |
|
Apertus 70B: Truly Open - Swiss LLM by ETH, EPFL and CSCS
|
319 |
-- |
2025-09-02 |
|
DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning
|
263 |
-- |
2025-12-01 |
|
The Smol Training Playbook: The Secrets to Building World-Class LLMs
|
262 |
-- |
2025-10-30 |
|
LLM in a Flash: Efficient LLM Inference with Limited Memory
|
252 |
-- |
2023-12-20 |
|
Microsoft Phi-2 model changes licence to MIT
|
240 |
-- |
2024-01-06 |
|
Falcon 180B
|
238 |
-- |
2023-09-06 |
|
OpenLLaMA 13B Released
|
229 |
-- |
2023-06-18 |
|
Kokoro WebGPU: Real-time text-to-speech 100% locally in the browser
|
227 |
-- |
2025-02-07 |
|
Hugging Face Releases Agents
|
214 |
-- |
2023-05-10 |
|
Space secrets leak disclosure
|
197 |
-- |
2024-06-01 |
|
BigCode Project Releases StarCoder: A 15B Code LLM
|
185 |
-- |
2023-05-04 |
|
Best 7B LLM on leaderboards made by an amateur following a medium …
|
181 |
-- |
2024-01-05 |
|
Stability.ai sent a take down request to Runway ML's SD v1.5 citing …
|
179 |
-- |
2022-10-20 |
|
We raised $100M for open and collaborative machine learning
|
175 |
-- |
2022-05-09 |
|
Llama 3 8B is almost as good as Wizard 2 8x22B
|
168 |
-- |
2024-04-19 |
|
SantaCoder: A new 1.1B code model for generation and infilling
|
168 |
-- |
2022-12-22 |
|
Nvidia releases NVLM 1.0 72B open weight model
|
167 |
-- |
2024-10-02 |
|
Qwen3-4B-Thinking-2507
|
166 |
-- |
2025-08-06 |
|
StackLlama: A hands-on guide to train LlaMa with RLHF
|
165 |
-- |
2023-04-06 |
|
Explaining the SDXL Latent Space
|
163 |
-- |
2024-02-05 |
|
BLOOM: The largest open multilingual language model
|
160 |
-- |
2022-07-12 |
|
Show HN: Text-to-video model from scratch (2 brothers, 2 years, 2B params)
|
156 |
-- |
2026-01-22 |
|
Hugging Face and Google partner for AI collaboration
|
152 |
-- |
2024-01-25 |
|
Qwen3-235B-A22B-Thinking-2507
|
152 |
-- |
2025-07-25 |
|
Show HN: Penny-1.7B Irish Penny Journal style transfer
|
149 |
-- |
2025-06-02 |
|
Wordalle – Guess the prompt used to generate a set of images …
|
137 |
-- |
2022-07-01 |
|
Mistral-8x7B-Chat
|
131 |
-- |
2023-12-10 |
|
A CC-By Open-Source TTS Model with Voice Cloning
|
131 |
-- |
2024-11-04 |
|
Qwen-Image-Layered: transparency and layer aware open diffusion model
|
130 |
-- |
2025-12-19 |
|
FineWeb: Decanting the web for the finest text data at scale
|
127 |
-- |
2024-06-02 |
|
Yi-34B-Chat
|
115 |
-- |
2023-11-24 |
|
GPT-3.5 and Wolfram Alpha via LangChain
|
107 |
-- |
2023-01-18 |
|
The Falcon has landed in the Hugging Face ecosystem
|
105 |
-- |
2023-06-05 |
|
HuggingChat: Chat with Open Source Models
|
103 |
-- |
2024-02-21 |
|
Hugging Face and AWS partner to make AI more accessible
|
102 |
-- |
2023-02-21 |
|
HuggingFace Training Cluster as a Service
|
101 |
-- |
2023-09-05 |
|
More than 80 AI models from Qualcomm
|
95 |
-- |
2024-02-28 |
|
Segmind Stable Diffusion – A smaller version of Stable Diffusion XL
|
95 |
-- |
2023-10-25 |
|
LLaMA-Pro-8B
|
94 |
-- |
2024-01-06 |
|
HuggingChat
|
93 |
-- |
2023-04-25 |
|
Yarn-Mistral-7B-128k
|
88 |
-- |
2023-11-11 |
|
Qwen3 30B-A3B
|
87 |
-- |
2025-07-30 |
|
Apple/OpenELM: Efficient Open-Source Family Language Models
|
82 |
-- |
2024-04-24 |
|
Sparse LLM Inference on CPU: 75% fewer parameters
|
78 |
-- |
2023-10-19 |
|
Pokemon GAN
|
77 |
-- |
2022-02-14 |
|
YouTube-Commons: Audio transcripts of 2,063,066 YouTube videos, CC-By license
|
75 |
-- |
2024-04-18 |
|
Switch Transformers C – 2048 experts (1.6T params for 3.1 TB) (2022)
|
73 |
-- |
2023-11-20 |
|
Multimodal Neurons in Pretrained Text-Only Transformers
|
66 |
-- |
2023-08-04 |
|
Show HN: Simply Reading Analog Gauges – GPT4, CogVLM Can't
|
66 |
-- |
2024-01-22 |
|
Voxtral-Mini-3B-2507 – Open source speech understanding model
|
64 |
-- |
2025-07-15 |
|
Open-sourcing 5,000hrs of self-driving dataset
|
63 |
-- |
2025-03-11 |
|
HuggingChat – ChatGPT alternative with open source models
|
61 |
-- |
2023-12-15 |
|
MSFT's WizardLM2 models have been taken down
|
58 |
-- |
2024-04-16 |
|
OpenLLaMA 7B Training Completed to 1T Tokens
|
58 |
-- |
2023-06-07 |
|
Phi-2
|
57 |
-- |
2023-12-13 |
|
Dolphin-2_6-Phi-2
|
56 |
-- |
2023-12-24 |
|
Alibaba releases 72B LLM with 32k context length
|
55 |
-- |
2023-11-30 |
|
LiteLlama-460M-1T has 460M parameters trained with 1T tokens
|
54 |
-- |
2024-01-07 |
|
Qwen Image
|
54 |
-- |
2025-08-04 |
|
Fine-Tuning LLMs to 1.58bit
|
52 |
-- |
2024-09-18 |
|
Train faster static embedding models with sentence transformers
|
52 |
-- |
2025-01-15 |
|
Show HN: ChatToSTL – AI text-to-CAD for 3D printing
|
52 |
-- |
2025-06-12 |
|
LLaMA 3 70B Llamafiles
|
51 |
-- |
2024-04-19 |
|
Janus-Pro: Autoregressive framework unifying multimodal understanding&generation
|
49 |
-- |
2025-01-27 |
|
DeepSeek v3 beats Claude sonnet 3.5 and way cheaper
|
48 |
-- |
2024-12-26 |
|
Improving Parquet Dedupe on Hugging Face Hub
|
47 |
-- |
2024-10-08 |
|
Open LLAMA 13B released, trained on 1T tokens
|
47 |
-- |
2023-06-19 |
|
DALL·E Mini
|
46 |
-- |
2022-04-11 |
|
Open-LLM performances are plateauing
|
46 |
-- |
2024-06-29 |
|
The AI Research Residency Program
|
46 |
-- |
2022-03-23 |
|
4-Bit Quantization and QLoRA
|
41 |
-- |
2023-05-25 |
|
BLOOMChat, a 176B parameter, Multi-lingual, fine tuned chat
|
40 |
-- |
2023-05-19 |
|
What's Going on with the Open LLM Leaderboard?
|
40 |
-- |
2023-06-23 |
|
Kai-Fu Li's Yi-34B uses exactly Llama's architecture except for 2 tensor renamed
|
39 |
-- |
2023-11-14 |
|
DeepSeek-R1-Distill-Qwen-1.5B Surpasses GPT-4o in certain benchmarks
|
39 |
-- |
2025-01-20 |
|
Fully autonomous AI agents should not be developed
|
38 |
-- |
2025-02-07 |
|
Zephyr 7B – Mistral Finetune that responds like ChatGPT
|
37 |
-- |
2023-10-15 |
|
Whisper Jax: Transcribe a 1 hour of audio in under 15 seconds
|
36 |
-- |
2023-04-22 |
|
Qwen3-235B-A22B-Instruct-2507
|
36 |
-- |
2025-07-21 |
|
MistralLite by Amazon Web Services
|
34 |
-- |
2023-11-01 |
|
Mixtral-8x22B on HuggingFace
|
33 |
-- |
2024-04-10 |
|
The Ultra-Scale Playbook: Training LLMs on GPU Clusters
|
33 |
-- |
2025-02-19 |
|
Qwen3-Coder-30B-A3B-Instruct
|
32 |
-- |
2025-07-31 |
|
General OCR Theory: Towards OCR-2.0 via a Unified End-to-End Model
|
31 |
-- |
2024-09-11 |
|
Zephyr 141B, a Mixtral 8x22B fine-tune, is now available in Hugging Chat
|
30 |
-- |
2024-04-12 |
|
OpenFLUX.1
|
30 |
-- |
2024-10-04 |
|
Reachy Mini – The Open-Source Robot for Today's and Tomorrow's AI Builders
|
30 |
-- |
2025-07-09 |
|
Mistral 7B v0.2
|
29 |
-- |
2024-03-31 |
|
Mixture of Experts Explained
|
29 |
-- |
2023-12-11 |
|
TinyLlama at 2T of 3T
|
29 |
-- |
2023-11-19 |
|
Video2Game: Real-Time, Interactive, Realistic Environment from a Single Video
|
28 |
-- |
2024-04-16 |
|
Real-Time Latent Consistency Model
|
27 |
-- |
2023-10-30 |
|
Language Modeling Is Compression
|
27 |
-- |
2023-09-21 |
|
grok-2 on Hugging Face
|
27 |
-- |
2025-08-23 |
|
Llama-3.2-3B-Instruct-uncensored
|
26 |
-- |
2024-09-27 |
|
Pixel Art XL: Stable Diffusion XL for Pixel Art
|
26 |
-- |
2023-08-03 |
|
UC Berkeley's open-source Vicuna LLM chatbot released new improved model weights
|
26 |
-- |
2023-04-14 |
|
Llama can now see and run on your device – welcome Llama …
|
26 |
-- |
2024-09-25 |
|
DeepSeek-v3.1
|
26 |
-- |
2025-08-21 |
|
Llama 1.3B Trained on 200B Tokens for Commercial Use
|
25 |
-- |
2023-04-28 |
|
New Phi-3.5 Models from Microsoft, including new MoE
|
25 |
-- |
2024-08-20 |
|
LLM: Transformer Is Linear
|
25 |
-- |
2024-05-24 |
|
DeepSeek-v3.1-Base
|
25 |
-- |
2025-08-19 |
|
NousResearch/Nous-Hermes-2-Yi-34B
|
24 |
-- |
2023-12-26 |
|
Accelerating Stable Diffusion XL Inference with Jax on Cloud TPU v5e
|
23 |
-- |
2023-10-03 |
|
HuggingFace - Tencent launches Hunyuan Large which outperforms Llama 3.1 405B
|
23 |
-- |
2024-11-05 |
|
Mistral Small 3.2 (24B-Instruct-2506)
|
23 |
-- |
2025-06-20 |
|
DeepSeek-v3.1
|
23 |
-- |
2025-08-19 |
|
Lineage Explorer for open source models – Hugging Face Space
|
22 |
-- |
2024-01-18 |
|
Llama 22B: 13B V2 with 33B attention heads frankensteined on
|
22 |
-- |
2023-08-18 |
|
Show HN: Fineweb-Edu-Fortified dataset: Fineweb-Edu deduped, embeddings included
|
22 |
-- |
2024-08-14 |
|
Mistral-7B-OpenOrca. First 7B model to beat all other models <30B
|
21 |
-- |
2023-10-02 |
|
Würstchen: Fast Diffusion for Image Generation
|
21 |
-- |
2023-09-13 |
|
Llama 3.2
|
21 |
-- |
2024-09-25 |
|
Kyutai 1.6B Streaming TTS
|
21 |
-- |
2025-07-03 |
|
Qwen3 235B beats Claude on some code benchmarks
|
21 |
-- |
2025-07-21 |
|
Code Generation with HuggingFace
|
20 |
-- |
2022-06-07 |
|
Selene Mini: Open-sourced SOTA small language-model-as-a-judge
|
20 |
-- |
2025-01-29 |
|
Ernie-ViLG better anime quality than Stable Diffusion
|
19 |
-- |
2022-09-01 |
|
Fine-tune and deploy open LLMs as containers using AIKit - Part 1
|
19 |
-- |
2024-06-06 |
|
makeMoE: Implement a Sparse Mixture of Experts LLM from Scratch
|
19 |
-- |
2024-01-23 |
|
AMD and: Large Language Models Out-of-the-Box Acceleration with AMD GPU
|
19 |
-- |
2023-12-13 |
|
The smallest VLM ever: 250M parameters
|
19 |
-- |
2025-01-23 |
|
This Pokémon Does Not Exist: Using AI models to create fake cards …
|
18 |
-- |
2022-03-22 |
|
HuggingFace to Replace Git LFS with Xet
|
18 |
-- |
2024-08-23 |
|
GPT-NeoX
|
18 |
-- |
2022-12-14 |
|
Fake Insects: a game where you have to identify AI-generated insects
|
18 |
-- |
2024-08-17 |
|
Mixtral-8x22B-Instruct-v0.1
|
18 |
-- |
2024-04-17 |
|
Stable Diffusion Multiplayer
|
18 |
-- |
2022-10-30 |
|
Encrypted Large Language Models with Homomorphic Encryption
|
18 |
-- |
2023-08-03 |
|
Hermes-2-Pro-Llama-3-8B
|
18 |
-- |
2024-05-01 |
|
Orca 2: Teaching Small Language Models How to Reason
|
18 |
-- |
2023-11-21 |
|
Deepseek V3-0324
|
18 |
-- |
2025-03-24 |
|
Show HN: MiniSearch, a minimalist search engine with integrated browser-based AI
|
17 |
-- |
2023-10-15 |
|
StableLM-2-12B
|
17 |
-- |
2024-04-08 |
|
Gemini vs. GPT-4V: A Preliminary Comparison Through Qualitative Cases
|
17 |
-- |
2023-12-28 |
|
Una-Cybertron-7B
|
17 |
-- |
2023-12-08 |
|
GPT Baker lets you build your own open-source GPTs
|
17 |
-- |
2023-11-23 |
|
Deploy Livebook (Elixir) Notebooks as Apps to Hugging Face Spaces
|
17 |
-- |
2023-06-15 |
|
ChatRWKV
|
17 |
-- |
2023-03-23 |
|
DeepSeek R1
|
17 |
-- |
2025-01-20 |
|
Vector Search with DuckDB
|
17 |
-- |
2025-02-26 |
|
DiffuCoder-7B-CpGRPO: A code generation LLM developed by Apple
|
17 |
-- |
2025-07-04 |
|
NuExtract: A LLM for Structured Extraction
|
16 |
-- |
2024-06-29 |
|
An Analysis of Chinese LLM Censorship and Bias with Qwen 2 Instruct
|
16 |
-- |
2024-06-09 |
|
Phi-3 Weights Released
|
16 |
-- |
2024-04-23 |
|
New medical LLM beats Med-PaLM-2, GPT-4 on MMLU benchmarks
|
16 |
-- |
2024-07-31 |
|
Miqu 70B – possible leak of the mistral-medium LLM
|
16 |
-- |
2024-01-29 |
|
New Stable Diffusion model trained on high quality Art
|
16 |
-- |
2022-12-11 |
|
Qwen3 0.6B now on HuggingFace (quantized)
|
16 |
-- |
2025-04-28 |
|
Ollama can run any GGUF Model on Hugging Face Hub now
|
15 |
-- |
2024-10-16 |
|
Llama-3-70B-Instruct-Gradient-1048k
|
14 |
-- |
2024-05-04 |
|
New finance LLM passed the CFA Level III exam
|
14 |
-- |
2024-07-31 |
|
Airoboros-13B: 98% against GPT-3.5
|
14 |
-- |
2023-05-22 |
|
Run Mistral 7B model using less than 4GB of memory on your …
|
14 |
-- |
2024-07-23 |
|
Stable Diffusion 3 Medium Released
|
14 |
-- |
2024-06-12 |
|
Pre-computed vector embeddings available on HuggingFace
|
14 |
-- |
2024-01-22 |
|
TeapotLLM- an open-source <1B model for hallucination-resistant Q&A on a CPU
|
14 |
-- |
2025-04-16 |
|
DeepSeek-Prover-V2-671B
|
14 |
-- |
2025-04-30 |
|
DeepSeek-R1-0528 performance improvements
|
14 |
-- |
2025-05-29 |
|
Create a GPT3 powered Q&A Chatbot for *any* GitHub repo by posting …
|
13 |
-- |
2023-02-05 |
|
Yi-9B-200K
|
13 |
-- |
2024-03-17 |
|
An Introduction to Vision-Language Modeling
|
13 |
-- |
2024-05-28 |
|
Co-Doodle with Gemini
|
13 |
-- |
2025-03-19 |
|
Attention Sinks in LLMs for endless fluency
|
12 |
-- |
2023-10-09 |
|
FineWeb: 15T tokens of the finest data the web has to offer
|
12 |
-- |
2024-04-21 |
|
Idefics: Open Access 60B multimodal model
|
12 |
-- |
2023-08-22 |
|
Google AI just released Flan-T5 models
|
12 |
-- |
2022-10-24 |
|
Language model can listen while speaking
|
12 |
-- |
2024-08-07 |
|
ML for 3D Course on Hugging Face
|
12 |
-- |
2024-05-16 |
|
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
|
12 |
-- |
2024-04-09 |
|
Command-R: open weights 35B params / 128k tokens context length model by …
|
12 |
-- |
2024-03-11 |
|
StarCoder2 and The Stack v2: new code LLMs and dataset
|
12 |
-- |
2024-02-28 |
|
Jamba-v0.1: An Apache 2.0 licensed 52B Mamba Transformer hybrid LLM base model
|
12 |
-- |
2024-03-28 |
|
Stable difusion on multiplayer: Internet at it best
|
12 |
-- |
2022-10-30 |
|
Open-source DeepResearch – Freeing our search agents
|
12 |
-- |
2025-02-04 |
|
FUTO open-sources 1M row keyboard swipe dataset
|
12 |
-- |
2025-04-04 |
|
HuggingFace Is Down
|
11 |
-- |
2024-02-28 |
|
30B uncensored OSS model with no guardrails
|
11 |
-- |
2023-11-07 |
|
The Stack: 3 TB of permissively licensed source code in 30 programming …
|
11 |
-- |
2022-10-31 |
|
Experiments with Bitnet 1.5 (Ngmi)
|
11 |
-- |
2024-03-23 |
|
Hierarchical Masked 3D Diffusion Model for Video Outpainting
|
11 |
-- |
2023-09-06 |
|
FalconMamba 7B: The first attention-free and general-purpose pure Mamba model
|
11 |
-- |
2024-08-13 |
|
NPC-Playground, a 3D playground to interact with LLM-powered NPCs
|
11 |
-- |
2024-06-05 |
|
Open LLM Leaderboard
|
11 |
-- |
2024-01-02 |
|
Shallow Feed-Forward Neural Networks as Alternative to Attention in Transformers
|
11 |
-- |
2023-11-21 |
|
smolagents: A simple library to build AI agents
|
11 |
-- |
2025-01-02 |
|
DeepSeek-TNG-R1T2-Chimera
|
11 |
-- |
2025-07-02 |
|
CryptGPT: A Simple Approach to Privacy-Preserving LLMs Using Vigenere Cipher
|
10 |
-- |
2024-06-15 |
|
Whisperfile
|
10 |
-- |
2024-08-19 |
|
Llava Model for Video
|
10 |
-- |
2024-05-16 |
|
Show HN: Encrypted Credit Card Approval Using Homomorphic Encryption
|
10 |
-- |
2024-01-31 |
|
Vector embeddings model for medical literature
|
10 |
-- |
2024-01-08 |
|
From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting
|
10 |
-- |
2023-09-11 |
|
Origin of LLMs: An Evolutionary Tree and Graph for 15K Large Language …
|
10 |
-- |
2023-07-20 |
|
Show HN: Image Filtering App Using Homomorphic Encryption
|
10 |
-- |
2023-02-23 |
|
CMFNet: AI Image Deblurring
|
10 |
-- |
2022-02-27 |
|
Show HN: Downloadable AI Musical Instruments
|
10 |
-- |
2024-12-10 |
|
Phi-4 weights have been released under MIT license
|
10 |
-- |
2025-01-08 |
|
Hugging Face to sell open-source robots thanks to Pollen Robotics acquisition
|
10 |
-- |
2025-04-23 |
|
Open Source 1.7tb Dataset of What AI Crawlers Are Doing
|
10 |
-- |
2025-07-03 |
|
Parquet Content-Defined Chunking
|
10 |
-- |
2025-09-09 |
|
Wan2.2-S2V-14B – audio-driven cinematic video generation model
|
10 |
-- |
2025-08-26 |
|
Not All Language Model Features Are Linear
|
9 |
-- |
2024-05-25 |
|
Nvidia releases weights for Llama-3.1-Nemotron-70B-Instruct
|
9 |
-- |
2024-10-16 |
|
Stable Diffusion XL Inpainting model released
|
9 |
-- |
2023-09-01 |
|
Opentensor and Cerebras announce BTLM-3B-8K, a leading 3B param. language model
|
9 |
-- |
2023-07-24 |
|
Perspectives for first principles prompt engineering
|
9 |
-- |
2024-08-20 |
|
ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models
|
9 |
-- |
2024-05-28 |
|
Argilla released Notux 8x7B - DPO fine-tune of Mixtral 8x7B
|
9 |
-- |
2024-01-04 |
|
LLM Arena. Mistral-small best open model. Gemini Pro beaten by 2 open …
|
9 |
-- |
2023-12-17 |
|
Meta-llama (Meta Llama 2)
|
9 |
-- |
2023-07-18 |
|
Summary of the Tokenizers
|
9 |
-- |
2023-02-07 |
|
Show HN: Sentiment Analysis on Encrypted Data with Homomorphic Encryption
|
9 |
-- |
2022-11-21 |
|
RunwayML fine tuned Stable Diffusion 1.5 model
|
9 |
-- |
2022-10-20 |
|
Mistral-Large-Instruct-2411 – advanced dense Large Language Model (LLM) 123B
|
9 |
-- |
2024-11-18 |
|
MIT Researchers Unveil New Method to Improve LLM Inference Performance
|
9 |
-- |
2024-10-04 |
|
Aryn/deformable-detr-DocLayNet – open-source Layout Model
|
9 |
-- |
2024-07-31 |
|
AIMO (AI Math Olympiad) progress prize winning solution
|
9 |
-- |
2024-07-10 |
|
Mistral-7B-v0.3 released on HuggingFace
|
9 |
-- |
2024-05-22 |
|
Microsoft Phi-3 3.8B model with 128k Context
|
9 |
-- |
2024-04-23 |
|
The Stack v2: a 3B files in 600 programming languages dataset
|
9 |
-- |
2024-03-07 |
|
Spaces ZeroGPU: Dynamic GPU Allocation for Spaces
|
9 |
-- |
2024-12-15 |
|
Show HN: A Transformer model that preserves logical equivalence
|
9 |
-- |
2025-03-02 |
|
NousResearch/Nous-Hermes-2-Llama-2-70B
|
8 |
-- |
2024-02-12 |
|
Gradio-Lite: Serverless Gradio Running in the Browser
|
8 |
-- |
2023-10-25 |
|
Show HN: Parley: The RPG where you Negotiate with Bandits
|
8 |
-- |
2023-04-26 |
|
Show HN: We made an encrypted DNA testing app using Homomorphic Encryption
|
8 |
-- |
2024-10-02 |
|
NexusRaven-V2-13B
|
8 |
-- |
2024-01-25 |
|
Generate 1 page comic by text
|
8 |
-- |
2023-09-03 |
|
Drag Your GAN: Interactive Point-Based Manipulation on Generative Image Manifold
|
8 |
-- |
2023-05-23 |
|
Open-source 70B model surpass GPT-4o and Claude 3.5 on Arena Hard
|
8 |
-- |
2024-10-15 |
|
Llama 3.1 70B compressed by 6.4x using AQLM-PV, now released
|
8 |
-- |
2024-09-17 |
|
Mistral AI Pixtral
|
8 |
-- |
2024-09-11 |
|
Gradio Notebook – Generative AI Notebook Interface for Hugging Face Spaces
|
8 |
-- |
2024-02-14 |
|
Show HN: Open-source model to chat with your documents/data
|
8 |
-- |
2023-08-14 |
|
Yes, Transformers Are Effective for Time Series Forecasting (+ Autoformer)
|
8 |
-- |
2023-06-25 |
|
Hugging Face OpenAssistant
|
8 |
-- |
2023-06-24 |
|
Dataset of 35,316,999 HackerNews Posts and Comments (2006 – 2023)
|
8 |
-- |
2023-04-24 |
|
Show HN: Athelas – Automagically Repair Broken Code
|
8 |
-- |
2023-01-03 |
|
Scaling Test Time Compute with Open Models
|
8 |
-- |
2024-12-16 |
|
Sesame CSM-1B: Open-Source Conversational Speech Model
|
8 |
-- |
2025-03-14 |
|
DeepSeek-Prover-V2-671B
|
8 |
-- |
2025-04-30 |
|
Model Context Protocol (MCP) Course
|
8 |
-- |
2025-05-21 |
|
Tencent's Hunyuan Instruct 7B/4B/1.8B/0.5B new models have been released
|
8 |
-- |
2025-08-04 |
|
MistralAI released a new Magistral Small 2509
|
8 |
-- |
2025-09-17 |
|
Phi-3 Technical a Highly Capable Language Model Locally on Your Phone
|
7 |
-- |
2024-04-23 |
|
TinyStories: How Small Can Language Models Be and Still Speak Coherent English?
|
7 |
-- |
2023-05-16 |
|
Am I in the Stack?
|
7 |
-- |
2024-03-20 |
|
Common Corpus: the largest public domain dataset for training LLMs
|
7 |
-- |
2024-03-20 |
|
Introducing “Clerkie“: A LangChain Q&A bot for AI developers
|
7 |
-- |
2023-01-18 |
|
Show HN: Step up your Midjourney AI images with this prompt autocomplete
|
7 |
-- |
2022-09-10 |
|
Hugging Face launches Agents 2.0
|
7 |
-- |
2024-05-13 |
|
OpenHermesPreferences: Dataset of ~1M AI preferences from teknium/OpenHermes-2.5
|
7 |
-- |
2024-02-26 |
|
Microsoft's Orca 7B may violate OpenAI's Terms of Use
|
7 |
-- |
2023-12-05 |
|
Stable Beluga 2 – Llama2 70B finetuned on an Orca style Dataset …
|
7 |
-- |
2023-07-28 |
|
Databricks’ dolly-v2-12B, an instruction-following large language model
|
7 |
-- |
2023-04-12 |
|
Cerebras releases its own open source GPT models (Apache 2.0 License)
|
7 |
-- |
2023-03-28 |
|
Mini- Dust3r: A miniature version of dust3r running in a HuggingFace Space
|
7 |
-- |
2024-05-16 |
|
1B+ words corpus of original texts and experimental post-OCR correction output
|
7 |
-- |
2024-04-26 |
|
Show HN: Chess-LLM, using constrained-generation to force LLMs to battle it out
|
7 |
-- |
2024-03-14 |
|
Grandmaster-Level Chess Without Search
|
7 |
-- |
2024-02-08 |
|
Create a Web Interface for Your LLM in Python
|
7 |
-- |
2024-01-23 |
|
Show HN: Interactively explore your Hugging Face dataset with one line of …
|
7 |
-- |
2023-10-25 |
|
Show HN: DocQuery, an OSS tool for analyzing documents with LLMs
|
7 |
-- |
2022-09-01 |
|
Hugging Face datasets and models for cybersecurity/sofwtare vulnerabilities
|
7 |
-- |
2025-03-09 |
|
ByteDance/Dolphin on HuggingFace
|
7 |
-- |
2025-05-19 |
|
Holo1.5: Foundational Models for Computer Use Agents
|
7 |
-- |
2025-09-15 |
|
LFM2 WebGPU
|
7 |
-- |
2025-08-06 |
|
OpenAI/GPT-OSS-120B · Hugging Face
|
7 |
-- |
2025-08-05 |
|
CodeFusion: A Pre-Trained Diffusion Model for Code Generation
|
6 |
-- |
2023-10-30 |
|
OpenChat 3.5: 7B model with comparable perf to ChatGPT
|
6 |
-- |
2023-11-02 |
|
New leaderboard drop: Judge Arena
|
6 |
-- |
2024-11-19 |
|
Phased Consistency Model
|
6 |
-- |
2024-05-29 |
|
Generate Illusions with Stable Diffusion
|
6 |
-- |
2023-09-16 |
|
Mann-E, an open source Equivalent of Midjourney reached its version 4.1.3
|
6 |
-- |
2023-03-04 |
|
A Llama 70B finetune that has reflection baked into it's weights
|
6 |
-- |
2024-09-05 |
|
Show HN: Understand politics by visualising manifesto embeddings
|
6 |
-- |
2024-07-07 |
|
Mistral releases the v0.3 of its 7B LLM
|
6 |
-- |
2024-05-22 |
|
Idefics2: A Powerful 8B Vision-Language Model for the Community
|
6 |
-- |
2024-05-14 |
|
Show HN: Open-source LLM for data labeling
|
6 |
-- |
2024-05-08 |
|
Dolphin-2.9-Llama3-8B
|
6 |
-- |
2024-04-21 |
|
Introduction to 3D Gaussian Splatting
|
6 |
-- |
2024-04-02 |
|
Qwen is a large language model series by Alibaba Cloud
|
6 |
-- |
2023-09-27 |
|
Show HN: TCO Calculator to compare on-prem LLM deployment vs. OpenAI and …
|
6 |
-- |
2023-08-21 |
|
Llama-2-70B-instruct-v2
|
6 |
-- |
2023-08-03 |
|
Falcon 40B-Instruct GGML
|
6 |
-- |
2023-06-15 |
|
RWKV – An RNN with the Advantages of a Transformer
|
6 |
-- |
2023-05-15 |
|
Assisted Generation: a new direction toward low-latency text generation
|
6 |
-- |
2023-05-11 |
|
Databricks Publishes a Version of Dolly LLM to Hugging Face
|
6 |
-- |
2023-03-30 |
|
Hugging Face introduces Pull Requests and Discussions
|
6 |
-- |
2022-05-25 |
|
Kokoro-TTS
|
6 |
-- |
2025-01-13 |
|
Microsoft Phi 4 with R1 Reasoning
|
6 |
-- |
2025-02-04 |
|
DeepSeek-R1 without CCP censorship
|
6 |
-- |
2025-02-20 |
|
More Efficient Chain-of-Thought Reasoning Through Certainty Probing
|
6 |
-- |
2025-02-18 |
|
SigLIP 2: A better multilingual vision language encoder
|
6 |
-- |
2025-02-22 |
|
Qwen2.5-Omni Technical Report
|
6 |
-- |
2025-03-30 |
|
Better than DeepSeek R1? MiniMax-M1:open-weight hybrid-attention reasoning model
|
6 |
-- |
2025-06-16 |
|
Show HN: Agent Leaderboard 2.0 – Domain Specific edition
|
6 |
-- |
2025-07-17 |
|
Apple releases FastVLM and MobileCLIP2 on HF, real-time video captioning
|
6 |
-- |
2025-08-30 |
|
Show HN: We built a better reranker and open sourced it
|
6 |
-- |
2025-08-27 |
|
Nvidia STT Parakeet v3
|
6 |
-- |
2025-08-15 |
|
First 70B model released with all training epochs and data
|
6 |
-- |
2025-09-12 |
|
Qwen3-Next series represents our next-generation foundation models
|
6 |
-- |
2025-09-12 |
|
Qwen Image Edit - SOTA Open Weight Image Editing Model
|
6 |
-- |
2025-08-18 |
|
Cybersecurity Instruction Tuned Model
|
6 |
-- |
2025-08-05 |
|
TinyLlama a 1.1B Llama model trained on 3T tokens reaches 1.0 release
|
5 |
-- |
2023-12-31 |
|
Gemma-2 2B beats GPT3.5 on Chatbot Arena
|
5 |
-- |
2024-07-31 |
|
FineWeb-Edu: new 1.3T tokens web dataset
|
5 |
-- |
2024-06-02 |
|
Wall Street Journal Hedcut Stable Diffusion Model
|
5 |
-- |
2024-01-23 |
|
New Mixtral HQQ Quantzied 4-bit/2-bit configuration
|
5 |
-- |
2023-12-18 |
|
Personal co-pilot with a fine-tuning and a VSCode extension
|
5 |
-- |
2023-10-31 |
|
Segment Anything Model (Sam) in the Browser with Rust and WASM
|
5 |
-- |
2023-09-16 |
|
SD-XL 1.0 Model Card
|
5 |
-- |
2023-07-26 |
|
AI Policy: Open ML Considerations in the EU AI Act
|
5 |
-- |
2023-07-26 |
|
Modified Version of Apache 2.0 License with Royalty Payments
|
5 |
-- |
2023-05-26 |
|
Creating a Coding Assistant with StarCoder
|
5 |
-- |
2023-05-10 |
|
CLIP Interrogator
|
5 |
-- |
2022-10-22 |
|
Blip: Image Captioning and Visual Question Answering AI
|
5 |
-- |
2022-02-26 |
|
Hertz-dev is an open-source model for full-duplex conversational audio
|
5 |
-- |
2024-11-16 |
|
New Dataset: RedPajama Dynamic Topic Modeling, 100K Docs W Topic Heirarchies
|
5 |
-- |
2024-11-11 |
|
Hugging Face launches HUGS: managed containers for on-premise model deployment
|
5 |
-- |
2024-10-23 |
|
Janus-1.3B: Unifying Multimodal Understanding and Generation
|
5 |
-- |
2024-10-18 |
|
Show HN: Arch-Function: 3B parameter LLM that beats GPT-4o on function calling
|
5 |
-- |
2024-10-16 |
|
Model2Vec: Make sentence transformers 500x faster on CPU, 15x smaller
|
5 |
-- |
2024-10-16 |
|
Whisper-Large-v3-Turbo
|
5 |
-- |
2024-10-03 |
|
Show HN: Automatic chaptering – From raw transcripts to structured documents
|
5 |
-- |
2024-09-09 |
|
TabReD: A Benchmark of Tabular Machine Learning In-the-Wild
|
5 |
-- |
2024-07-04 |
|
Microsoft releases weights for Florence-2 vision model
|
5 |
-- |
2024-06-19 |
|
Phi-3-medium-128k-instruct
|
5 |
-- |
2024-05-22 |
|
Ferret-v2: An Improved Baseline for Referring and Grounding with LLMs
|
5 |
-- |
2024-04-13 |
|
Gretel: Synthetic Text to SQL Dataset
|
5 |
-- |
2024-04-04 |
|
Detecting performance and ethical vulnerabilities in popular Hugging Face models
|
5 |
-- |
2024-03-21 |
|
Design2Code: How Far Are We from Automating Front-End Engineering?
|
5 |
-- |
2024-03-10 |
|
Genie: Generative Interactive Environments
|
5 |
-- |
2024-02-26 |
|
TTS Arena: Benchmarking TTS Models in the Wild
|
5 |
-- |
2024-02-25 |
|
Cosmopedia: the largest synthetic dataset of textbooks generated by Mixtral
|
5 |
-- |
2024-02-20 |
|
DeciLM-7B
|
5 |
-- |
2023-12-12 |
|
Nash Learning from Human Feedback
|
5 |
-- |
2023-12-05 |
|
Real-time image generation demo on Gradio
|
5 |
-- |
2023-11-12 |
|
Convert a transformers model to Core ML
|
5 |
-- |
2023-04-06 |
|
Wikipedia Txtai Embeddings Index
|
5 |
-- |
2023-03-21 |
|
Show HN: Get the gist of anyone's Twitter feed
|
5 |
-- |
2023-02-24 |
|
Illustrating RLHF that's critical for ChatGPT
|
5 |
-- |
2022-12-09 |
|
Stable Diffusion Webapp
|
5 |
-- |
2022-09-28 |
|
The World’s Largest Open Multilingual Language Model: Bloom
|
5 |
-- |
2022-08-15 |
|
Wikipedia assistant directly answers your questions
|
5 |
-- |
2022-02-15 |
|
Moonshine – open-source, real-time speech-to-text in the browser
|
5 |
-- |
2024-12-19 |
|
Open R1: Update #2
|
5 |
-- |
2025-02-11 |
|
Deepseek VL2 Small
|
5 |
-- |
2025-02-08 |
|
Gemma 3 QAT (Quantized Aware Training) 3x less memory
|
5 |
-- |
2025-04-03 |
|
DocumentAI with 256M Parameters
|
5 |
-- |
2025-03-20 |
|
An open source common knowledge and context based Hallucination Detection Model
|
5 |
-- |
2025-04-29 |
|
Mixture of Tunable Experts-DeepSeek R1 Behavior Modification at Inference Time
|
5 |
-- |
2025-05-01 |
|
CircleGuardBench Leaderboard
|
5 |
-- |
2025-05-07 |
|
Show HN: Raman-01 – A Pocket Physics Solver LLM
|
5 |
-- |
2025-05-05 |
|
An MCP-powered agent in 50 lines of code
|
5 |
-- |
2025-05-15 |
|
SWE-rebench: Over 21,000 Open Tasks for SWE LLMs
|
5 |
-- |
2025-05-29 |
|
The Common Pile v0.1
|
5 |
-- |
2025-06-06 |
|
You could have designed state of the art positional encoding
|
5 |
-- |
2025-05-20 |
|
LLM Embeddings Explained: A Visual and Intuitive Guide
|
5 |
-- |
2025-05-14 |
|
Show HN: KaniTTS – Open-source high-fidelity TTS with just 450M params
|
5 |
-- |
2025-09-19 |
|
GLM 4.5
|
5 |
-- |
2025-07-28 |
|
Gaia2 and Are: Empowering the Community to Evaluate Agents
|
5 |
-- |
2025-09-22 |
|
VibeVoice: A Frontier Open-Source Text-to-Speech Model
|
5 |
-- |
2025-08-26 |
|
Qwen2.5-Coder-3B Fine-Tuned for Triton Kernel Gen
|
5 |
-- |
2025-08-03 |
|
Google's Bard surpassing GPT-4, SECOND SPOT on the leaderboard
|
4 |
-- |
2024-01-26 |
|
Octopus V4: a graph of language models
|
4 |
-- |
2024-05-02 |
|
Llama-3 8B Instruct 262k
|
4 |
-- |
2024-04-26 |
|
CodeGemma – an official Google release for code LLMs
|
4 |
-- |
2024-04-09 |
|
Solar 10.7B: Elevating AI, Effortlessly
|
4 |
-- |
2023-12-27 |
|
WhiteRabbitNeo model series can be used for offensive/defensive cybersecurity
|
4 |
-- |
2023-12-20 |
|
Eric Hartford releases uncensored dolphin-2.5-mixtral-8x7B
|
4 |
-- |
2023-12-14 |
|
XTTS: New Generative model for Voice (weights released on HF)
|
4 |
-- |
2023-09-15 |
|
Prompt Injection Detection Model
|
4 |
-- |
2023-06-14 |
|
GPT-2 Output Detector
|
4 |
-- |
2022-12-05 |
|
Apple Open-Sources LLM DCLM-7B
|
4 |
-- |
2024-07-19 |
|
Open LLM Leaderboard v2
|
4 |
-- |
2024-06-29 |
|
Florence 2, Microsoft OCR Modell
|
4 |
-- |
2024-06-20 |
|
Apple OpenELM Instruct Models
|
4 |
-- |
2024-04-24 |
|
Phi-3 Released
|
4 |
-- |
2024-04-23 |
|
GemMoE: An 8x8 Mixture Of Experts based on Gemma
|
4 |
-- |
2024-03-13 |
|
Pearl-3x7B, an xtraordinary Mixure of Experts (MoE) for data science
|
4 |
-- |
2024-02-07 |
|
Introduction to State Space Models (SSM)
|
4 |
-- |
2024-01-24 |
|
Distributed Inference and Fine-Tuning of Large Language Models over the Internet
|
4 |
-- |
2023-12-17 |
|
Distil-Whisper: Distil-Small.en
|
4 |
-- |
2023-12-14 |
|
2-bit and 4-bit versions of Mixtral
|
4 |
-- |
2023-12-11 |
|
Nous-Capybara-34B-200k
|
4 |
-- |
2023-11-14 |
|
An open-source and privacy-by-design Conversational AI in-browser
|
4 |
-- |
2023-09-22 |
|
Large Language Models for Compiler Optimization
|
4 |
-- |
2023-09-14 |
|
Gaussian viewer streaming splats in web browser
|
4 |
-- |
2023-09-12 |
|
Puma: Secure Inference of LLaMA-7B in Five Minutes
|
4 |
-- |
2023-07-25 |
|
FreeWilly2: New LLM from Stability AI
|
4 |
-- |
2023-07-24 |
|
40B LLM wants to charge 10% royalty on revenue?
|
4 |
-- |
2023-05-26 |
|
Falcon-40B
|
4 |
-- |
2023-05-26 |
|
Fully Open Source LLM Chat App – Chat about the Transformers Docs
|
4 |
-- |
2023-03-14 |
|
Karlo, the first open source DALL-E 2 replication is here
|
4 |
-- |
2022-12-21 |
|
Show HN: Thought Leadership as a Service
|
4 |
-- |
2022-06-09 |
|
HtmlRAG: HTML Is Better Than Plain Text for RAG Systems
|
4 |
-- |
2024-11-06 |
|
Structured generation with Outlines, now in Rust
|
4 |
-- |
2024-10-22 |
|
Llama 3.2 in the Browser with WebGPU
|
4 |
-- |
2024-09-30 |
|
Multimodal TextImage Augmentation for Document Images
|
4 |
-- |
2024-09-14 |
|
'Reflection 70B' AI model could be the answer to pesky LLM hallucinations
|
4 |
-- |
2024-09-06 |
|
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers
|
4 |
-- |
2024-08-14 |
|
FHE can be leveraged for LLMs such as ChatGPT in a privacy-preserving …
|
4 |
-- |
2024-08-13 |
|
Introduction to Ggml
|
4 |
-- |
2024-08-13 |
|
Google releases Gemma 2 2B, ShieldGemma and Gemma Scope
|
4 |
-- |
2024-08-01 |
|
Gemma 2 2B Release
|
4 |
-- |
2024-08-01 |
|
Extracting Concepts from LLMs: Anthropic's recent discoveries
|
4 |
-- |
2024-06-08 |
|
EasyAnimate: End-to-end solution for high-resolution and long video generation
|
4 |
-- |
2024-06-04 |
|
Grokked Transformers Are Implicit Reasoners
|
4 |
-- |
2024-05-27 |
|
Paligemma: A versatile and lightweight vision-language model (VLM)
|
4 |
-- |
2024-05-14 |
|
4M Context – Llama-3-8B-Instruct
|
4 |
-- |
2024-05-09 |
|
ReFT: Representation Finetuning for Language Models
|
4 |
-- |
2024-04-05 |
|
Embedding Quantization: 25-45x retrieval speedup, 32x or 4x less memory usage
|
4 |
-- |
2024-03-22 |
|
Show HN: Chatbot Guardrails Arena
|
4 |
-- |
2024-03-21 |
|
Quanto: A PyTorch Quantization Toolkit
|
4 |
-- |
2024-03-18 |
|
On-device background removal with Transformers.js
|
4 |
-- |
2024-02-07 |
|
SegMoE: Segmind Mixture of Diffusion Experts
|
4 |
-- |
2024-02-05 |
|
NPHardEval leaderboard a benchmark for assessing the reasoning abilities of LLMs
|
4 |
-- |
2024-02-03 |
|
HuggingChat Assistants: Open source models with custom instructions
|
4 |
-- |
2024-02-02 |
|
TinyLlama Reaches 3T Checkpoint
|
4 |
-- |
2023-12-28 |
|
Obsidian-3B
|
4 |
-- |
2023-11-25 |
|
Yarn-Llama-2-70B-32k
|
4 |
-- |
2023-11-20 |
|
SDXL in 4 steps with Latent Consistency LoRAs
|
4 |
-- |
2023-11-09 |
|
Zephyr 7B
|
4 |
-- |
2023-10-27 |
|
Apple/coreml-stable-diffusion-XL-base-iOS
|
4 |
-- |
2023-09-30 |
|
DeepSpeed-Chat: Easy RLHF Training of ChatGPT-Like Models at All Scales
|
4 |
-- |
2023-08-04 |
|
Deploy LLMs with Hugging Face Inference Endpoints
|
4 |
-- |
2023-07-04 |
|
Instruct-Codegen: open-source instruction following codegen model
|
4 |
-- |
2023-05-27 |
|
MPT-7B-StoryWriter-65k+: LLM for super long contexts (Apache 2.0)
|
4 |
-- |
2023-05-05 |
|
BioGPT for Biomedical Scientific Discovery
|
4 |
-- |
2023-02-07 |
|
Using LoRA for Efficient Stable Diffusion Fine-Tuning
|
4 |
-- |
2023-01-26 |
|
From GPT2 to Stable Diffusion: Hugging Face Arrives to the Elixir Community
|
4 |
-- |
2022-12-09 |
|
Stable Diffusion pre-loaded with 250 community textual inversion concepts
|
4 |
-- |
2022-09-14 |
|
Overview of how Stable Diffusion works
|
4 |
-- |
2022-08-27 |
|
Editing Videos by Editing Text
|
4 |
-- |
2022-05-23 |
|
Latent Diffusion, open source alternative to DALL·E 2
|
4 |
-- |
2022-04-13 |
|
From Files to Chunks: Improving HF Storage Efficiency
|
4 |
-- |
2024-11-20 |
|
Show HN: Video Composition Tool Powered by Qwen2.5-Coder and FFmpeg
|
4 |
-- |
2024-11-24 |
|
Show HN: LatComp – Compress your image into a small and reversible …
|
4 |
-- |
2024-11-30 |
|
DeepSeek-V3-Base
|
4 |
-- |
2024-12-25 |
|
Qwen 2.5 Max
|
4 |
-- |
2025-01-28 |
|
Hugging Face open sources a web-browsing agent that uses VLMs
|
4 |
-- |
2025-01-24 |
|
Deepseek R1 Zero
|
4 |
-- |
2025-01-20 |
|
LLaSE-G1 A FOSS speech enhancement model
|
4 |
-- |
2025-03-08 |
|
Qwen/QwQ-32B released on Hugging Face
|
4 |
-- |
2025-03-06 |
|
Wan2.1-T2V-14B
|
4 |
-- |
2025-02-25 |
|
The Curse of Depth in Large Language Models
|
4 |
-- |
2025-02-13 |
|
Migrating Hugging Face off Git LFS and to a new storage system …
|
4 |
-- |
2025-03-18 |
|
MoCha: Towards Movie-Grade Talking Character Synthesis
|
4 |
-- |
2025-04-01 |
|
Qwen2.5-Omni-7B
|
4 |
-- |
2025-03-26 |
|
Open R1's OlympicCoder beats Deepseek R1, models and underlying dataset released
|
4 |
-- |
2025-03-25 |
|
Devin's First Open Source Model Beats O3
|
4 |
-- |
2025-05-06 |
|
Ltxv-13B – high-quality videos in real-time
|
4 |
-- |
2025-05-07 |
|
Show HN: HalluMix – A Benchmark for Real-World LLM Hallucination Detection
|
4 |
-- |
2025-05-06 |
|
Higgs – Rapidly Compress LLMs Without Significant Loss of Quality
|
4 |
-- |
2025-04-12 |
|
New virtual try on model family that seems to be SOTA
|
4 |
-- |
2025-06-28 |
|
Gemma 3n available in the open-source ecosystem
|
4 |
-- |
2025-06-26 |
|
Automated Discovery of High-Performance GPU Kernels with OpenEvolve
|
4 |
-- |
2025-06-28 |
|
Jan-Nano-128k: Empowering deeper research through extended context understanding
|
4 |
-- |
2025-06-25 |
|
Kimi-Dev-72B
|
4 |
-- |
2025-07-13 |
|
Kimi K2: 1T total parameter open-source LLM by Moonshot AI
|
4 |
-- |
2025-07-11 |
|
Mistral AI releases Devstral-Small-2507
|
4 |
-- |
2025-07-10 |
|
A 337M RSS feed dataset
|
4 |
-- |
2025-08-26 |
|
Trackio: A new experiment tracking library from Hugging Face
|
4 |
-- |
2025-07-29 |
|
Show HN: Single-agent long-horizon reasoning within one LLM run
|
4 |
-- |
2025-07-23 |
|
Tricks from OpenAI GPT-OSS you can use with transformers
|
4 |
-- |
2025-09-11 |
|
Kimi-K2-Instruct-0905
|
4 |
-- |
2025-09-05 |
|
OmniNeural – First NPU-Aware Multimodal Model
|
4 |
-- |
2025-08-24 |
|
Gemma 3-270M
|
4 |
-- |
2025-08-14 |
|
Pruned expert GPT-OSS 6.6B
|
4 |
-- |
2025-08-13 |
|
UIGEN-X-32B-0727 Reasoning Only UI Generation Model
|
4 |
-- |
2025-07-28 |
|
MiniLM-L6-v2 maps paragraphs to 384 dimension vector for clustering or search
|
3 |
-- |
2023-03-21 |
|
Show HN: Turn Any Article into a Conversation-Like Podcast
|
3 |
-- |
2024-05-22 |
|
Phi-1.5 (1.3B Outperforms Llama 2 7B)
|
3 |
-- |
2023-09-12 |
|
GPT-2B-001
|
3 |
-- |
2023-04-20 |
|
Open NotebookLM – Generate Podcasts from PDFs Using Open-Source AI
|
3 |
-- |
2024-10-15 |
|
AI has a problem with objectifying women
|
3 |
-- |
2024-05-28 |
|
Linus Torvalds Chat Bot
|
3 |
-- |
2024-02-02 |
|
ChatQA: Building GPT-4 Level Conversational QA Models
|
3 |
-- |
2024-01-19 |
|
10.7B Solar: Elevating Performance with Upstage Depth Up Scaling
|
3 |
-- |
2023-12-18 |
|
Voice Chat with Mistral 7B
|
3 |
-- |
2023-10-16 |
|
Hugging Face partner with AMD to accelerate state-of-the-art models
|
3 |
-- |
2023-06-14 |
|
Frames: Factuality, Retrieval, and Reasoning MEasurement Set
|
3 |
-- |
2024-10-01 |
|
Show HN: We just dropped a 8B alternative of OpenAI GPT-o1 and …
|
3 |
-- |
2024-09-20 |
|
Chronos-T5 (Tiny) – pretrained time series forecasting models
|
3 |
-- |
2024-08-14 |
|
HF for Legal, an open-source community on Hugging Face
|
3 |
-- |
2024-07-01 |
|
LegalKit, French labeled datasets built for legal ML training
|
3 |
-- |
2024-06-27 |
|
Nvidia releases ChatQA-1.5 in violation of Llama 3 license
|
3 |
-- |
2024-05-02 |
|
Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding
|
3 |
-- |
2024-04-26 |
|
Everyone seems to have forgotten about Gemma
|
3 |
-- |
2024-04-25 |
|
Introducing the Open Chain of Thought Leaderboard
|
3 |
-- |
2024-04-23 |
|
Google Gemma 1.1 2B and 7B instruct
|
3 |
-- |
2024-04-06 |
|
Starcoder-2
|
3 |
-- |
2024-02-28 |
|
DevPearl-2x7B, an xtraordinary Mixture of Experts (MoE) for development
|
3 |
-- |
2024-02-09 |
|
Nous-Hermes-2-SOLAR-10.7B
|
3 |
-- |
2024-01-02 |
|
Solar 10.7B
|
3 |
-- |
2023-12-27 |
|
Transformer.js: Machine Learning for the Web
|
3 |
-- |
2023-12-09 |
|
PixArt-α: Fast Training of Diffusion Transformer for Text-to-Image Synthetis
|
3 |
-- |
2023-12-04 |
|
Laiyer AI Released Its Open Source Prompt Injection Model
|
3 |
-- |
2023-11-29 |
|
LZMD: Lempel-Ziv Montecarlo Diffusion file format
|
3 |
-- |
2023-11-29 |
|
Faster MusicGen Generation with Streaming
|
3 |
-- |
2023-10-06 |
|
Llama 2 on Amazon SageMaker a Benchmark
|
3 |
-- |
2023-09-26 |
|
LoRA Roulette
|
3 |
-- |
2023-09-22 |
|
Open-source AI Discord bots with HuggingFace
|
3 |
-- |
2023-08-17 |
|
StableBeluga-7B
|
3 |
-- |
2023-07-29 |
|
MPT-30B – Apache 2.0 licensed LLM
|
3 |
-- |
2023-07-22 |
|
Show HN: I created a first-of-its-kind open corpus of Australian law
|
3 |
-- |
2023-06-26 |
|
Show HN: DocsGPT-7B – purpose optimised and finetuned model for documentation QA
|
3 |
-- |
2023-06-16 |
|
Alpaca Dataset Translated into Polish
|
3 |
-- |
2023-04-12 |
|
Bert 101 State of the Art NLP Model Explained
|
3 |
-- |
2022-03-02 |
|
SemScore: Evaluating LLMs with Semantic Similarity
|
3 |
-- |
2024-11-06 |
|
Meta released MobileLLM – 125M, 350M, 600M, 1B model checkpoints
|
3 |
-- |
2024-10-31 |
|
Hugging Face Now Automatically Detects Leaked Secrets
|
3 |
-- |
2024-09-05 |
|
Selective fine-tuning of Language Models with Spectrum
|
3 |
-- |
2024-09-03 |
|
Idefics3: Open multimodal model based on Llama-3.1-8B
|
3 |
-- |
2024-08-09 |
|
New Google Gemma 2 2B model
|
3 |
-- |
2024-07-31 |
|
Fine-Tune Llama 3.1 Ultra-Efficiently with Unsloth
|
3 |
-- |
2024-07-29 |
|
DiLoCo: Distributed Low-Communication Training of Language Models
|
3 |
-- |
2024-07-26 |
|
The largest math dataset of Olympiad problems for training LLMs
|
3 |
-- |
2024-07-21 |
|
SmolLM – Fast and Remarkably Powerful
|
3 |
-- |
2024-07-16 |
|
Whisper WebGPU: Real-time in-browser speech recognition
|
3 |
-- |
2024-06-08 |
|
UGI Leaderboard – Uncensored General Intelligence
|
3 |
-- |
2024-06-07 |
|
Transformers Are SSMs: Generalized Models and Efficient Algorithms Through
|
3 |
-- |
2024-06-04 |
|
Recovering 4D World from Monocular Video
|
3 |
-- |
2024-05-29 |
|
LiteVAE: Lightweight and Efficient Variational Autoencoders for Diffusion Models
|
3 |
-- |
2024-05-26 |
|
Advancing Theorem Proving in LLMs Through Large-Scale Synthetic Data
|
3 |
-- |
2024-05-26 |
|
Phi-3 in-browser inference using WebGPU
|
3 |
-- |
2024-05-08 |
|
Show HN: GPT Fine-Tune Formatter
|
3 |
-- |
2024-05-07 |
|
InstantMesh: Efficient 3D Mesh Generation from a Single Image
|
3 |
-- |
2024-04-15 |
|
Mixture of Finetuned and GPT4 Model
|
3 |
-- |
2024-04-07 |
|
H2O-Danube2-1.8B-Chat
|
3 |
-- |
2024-04-07 |
|
Yi-9B
|
3 |
-- |
2024-04-05 |
|
Dolphin-2.8-mistral-7B-v02
|
3 |
-- |
2024-04-03 |
|
Common Corpus – Start of the largest public domain dataset for training …
|
3 |
-- |
2024-03-20 |
|
MoAI: Mixture of All Intelligence for Large Language and Vision Models
|
3 |
-- |
2024-03-14 |
|
OpenChat-3.5-0106-Gemma
|
3 |
-- |
2024-03-10 |
|
Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping
|
3 |
-- |
2024-02-23 |
|
Microsoft's LongRoPE: Extending LLM Context Window Beyond 2M Tokens
|
3 |
-- |
2024-02-22 |
|
Stable Diffusion XL Lightning
|
3 |
-- |
2024-02-21 |
|
Enterprise Scenarios leaderboard evals the perf. of LLMs on enterprise use cases
|
3 |
-- |
2024-02-03 |
|
Show HN: A lineage explorer for open source models and datasets
|
3 |
-- |
2024-01-23 |
|
Aim – An Apple Collection
|
3 |
-- |
2024-01-19 |
|
LLaVA-3B
|
3 |
-- |
2024-01-01 |
|
Dolphin-2.6-Mistral-7B
|
3 |
-- |
2023-12-29 |
|
MonadGPT
|
3 |
-- |
2023-12-28 |
|
MiniMA-2-3B
|
3 |
-- |
2023-12-27 |
|
WaveCoder: Widespread Versatile Enhanced Instruction Tuning with Refine Data Gen
|
3 |
-- |
2023-12-26 |
|
StarVector: Generating Scalable Vector Graphics Code from Images
|
3 |
-- |
2023-12-20 |
|
AITube - Youtube but everything is AI generated
|
3 |
-- |
2023-12-15 |
|
Refact-1.6B
|
3 |
-- |
2023-12-08 |
|
Llama-2-7B-chat-mlx for Apple’s new MLX framework
|
3 |
-- |
2023-12-06 |
|
NeuralHermes-2.5-Mistral-7B
|
3 |
-- |
2023-11-29 |
|
Tulu-2-Dpo-70B
|
3 |
-- |
2023-11-21 |
|
Show HN: New Launch OrionStar-Yi-34B-Chat beats Llama2-70B and GPT-3.5-turbo
|
3 |
-- |
2023-11-20 |
|
Nvidia nemotron-3-8B-base-4k
|
3 |
-- |
2023-11-16 |
|
Optimizing LLMs in Production
|
3 |
-- |
2023-11-15 |
|
HuggingFace Daily Papers
|
3 |
-- |
2023-11-14 |
|
Make your llama generation time fly with AWS Inferentia2
|
3 |
-- |
2023-11-11 |
|
Show HN: Face-Stylization – Create face styling with just 8 images
|
3 |
-- |
2023-11-09 |
|
Document Question Answering
|
3 |
-- |
2023-10-30 |
|
Apple's LLMs and other GenAI models on HuggingFace
|
3 |
-- |
2023-10-19 |
|
Using HuggingFace to Train a GPT-2 Model for Music Generation
|
3 |
-- |
2023-10-09 |
|
MotionGPT: Finetuned LLMs Are General-Purpose Motion Generators
|
3 |
-- |
2023-09-19 |
|
Generative Image Dynamics
|
3 |
-- |
2023-09-15 |
|
OpenHermes-13B based on Llama-2
|
3 |
-- |
2023-09-07 |
|
Llama2.c LLM: ported to Rust and running in the browser
|
3 |
-- |
2023-09-07 |
|
Accelerating Vision-Language Models: BridgeTower on Habana Gaudi2
|
3 |
-- |
2023-09-01 |
|
Fine-tuned CodeLlama beats GPT-4 on HumanEval
|
3 |
-- |
2023-08-27 |
|
LoRA the Explorer
|
3 |
-- |
2023-08-17 |
|
Fine-tune Llama 2 with DPO
|
3 |
-- |
2023-08-08 |
|
Show HN: Goat-7B LLM, a new SOTA among the open-source 7B models
|
3 |
-- |
2023-07-25 |
|
How is ChatGPT's behavior changing over time?
|
3 |
-- |
2023-07-19 |
|
Show HN: New control net model for AI art QRcode
|
3 |
-- |
2023-06-27 |
|
Show HN: Bert-Based Classification Model for Google Local Listings
|
3 |
-- |
2023-06-26 |
|
Mosaic ML: MPT-30B-Chat
|
3 |
-- |
2023-06-25 |
|
Video Composer: Create videos using GPT-4 and FFmpeg
|
3 |
-- |
2023-06-15 |
|
MusicGen from Meta on Hugging Face
|
3 |
-- |
2023-06-09 |
|
OpenLLaMA 7B Released
|
3 |
-- |
2023-06-07 |
|
WizardLM-30B
|
3 |
-- |
2023-06-06 |
|
Can AI Code?
|
3 |
-- |
2023-06-05 |
|
Constrained Text Generation with Transformers
|
3 |
-- |
2023-05-22 |
|
StarCoder: A State-of-the-Art LLM for Code
|
3 |
-- |
2023-05-05 |
|
Swift Diffusers: Fast Stable Diffusion for Mac
|
3 |
-- |
2023-04-02 |
|
Fine-tuning 20B LLMs with RLHF on a 24GB consumer GPU
|
3 |
-- |
2023-03-12 |
|
Parameter-Efficient Fine-Tuning Billion-Scale Models on Low-Resource Hardware
|
3 |
-- |
2023-02-10 |
|
Finetuned Stable Diffusion: open, free, beautiful results near to Midjouney
|
3 |
-- |
2022-12-28 |
|
Hugging Face Machine Learning Demos Are Now on ArXiv
|
3 |
-- |
2022-11-17 |
|
Pony Diffusion
|
3 |
-- |
2022-10-01 |
|
Show HN: Audio Intelligence Dashboard
|
3 |
-- |
2022-09-26 |
|
Fast Bloom Inference with DeepSpeed and Accelerate
|
3 |
-- |
2022-09-15 |
|
YOLOv6: Real-Time Object Detection Demo
|
3 |
-- |
2022-07-15 |
|
An Introduction to Deep Reinforcement Learning
|
3 |
-- |
2022-05-13 |
|
Transform natural language queries to vector search SQL
|
3 |
-- |
2022-04-19 |
|
Single Image to 3D in the Browser
|
3 |
-- |
2022-04-15 |
|
JPEG Artifacts Removal
|
3 |
-- |
2022-04-12 |
|
Multimodal Augmentation of Generative Models Through Adapter-Based Finetuning
|
3 |
-- |
2022-03-20 |
|
AI Line Drawing Generation
|
3 |
-- |
2022-03-11 |
|
OCR Model Beats Captcha
|
3 |
-- |
2022-02-23 |
|
Fairseq S2: Scalable Speech Synthesis
|
3 |
-- |
2022-01-21 |
|
Dataset Card for 1M Bluesky Posts
|
3 |
-- |
2024-11-27 |
|
New 2B vision language model that consumes the least memory
|
3 |
-- |
2024-11-26 |
|
New synthetic dataset beating MSFT and mistral's SFT recipe
|
3 |
-- |
2024-11-22 |
|
Show HN: MilkDropLM – generate presets for the MilkDrop music visualizer
|
3 |
-- |
2024-12-06 |
|
Quantum+AI Qiskit Code Assistant Open Source model
|
3 |
-- |
2024-11-27 |
|
informatiker/20-million-bluesky-posts
|
3 |
-- |
2024-11-29 |
|
Automated GitHub Issue Creation Using Structured Generation
|
3 |
-- |
2024-11-29 |
|
QwQ-32B-Preview
|
3 |
-- |
2024-11-27 |
|
Welcome to the Falcon 3 Family of Open Models
|
3 |
-- |
2024-12-17 |
|
Meta releases family of multimodal models that comprehend hour-long video
|
3 |
-- |
2024-12-16 |
|
Finding Moroccan Arabic (Darija) in the Fineweb 2 Dataset
|
3 |
-- |
2024-12-09 |
|
Timeline of AI model releases in 2024
|
3 |
-- |
2025-01-01 |
|
Fine-Tune Deepseek-R1 with a Synthetic Reasoning Dataset
|
3 |
-- |
2025-02-11 |
|
Hugging Face AI Agents Course
|
3 |
-- |
2025-02-10 |
|
HuggingFace open reproduction of R1 data and training pipeline
|
3 |
-- |
2025-01-27 |
|
DeepSeek-R1 on iPhone? (DeepSeek-R1-Distill-Qwen-1.5B-GGUF)
|
3 |
-- |
2025-01-21 |
|
GEN3C: 3D-Informed World-Consistent Video
|
3 |
-- |
2025-03-06 |
|
Microsoft Releases Phi-4-multimodal [pdf]
|
3 |
-- |
2025-02-26 |
|
WanX open weight sota 14B video model release
|
3 |
-- |
2025-02-25 |
|
Step-Audio-Chat: a 132B end-to-end speech-to-speech model
|
3 |
-- |
2025-02-17 |
|
Show HN: First large scale evaluation of 4o Image Generation from OpenAI
|
3 |
-- |
2025-03-27 |
|
EuroBERT: A High-Performance Multilingual Encoder Model
|
3 |
-- |
2025-03-10 |
|
Training LLMs with GRPO and Interpreter Feedback Using WebAssembly
|
3 |
-- |
2025-04-06 |
|
AgentRxiv: Towards Collaborative Autonomous Research
|
3 |
-- |
2025-03-25 |
|
DeepSeek V3-0324 Posted to HuggingFace
|
3 |
-- |
2025-03-24 |
|
Nvidia Isaac GR00T N1 is the first open foundation model for humanoid
|
3 |
-- |
2025-03-21 |
|
VACE: All-in-One Video Creation and Editing from Alibaba
|
3 |
-- |
2025-03-12 |
|
Drape1: Open-Source Scalable adapter for clothing generation
|
3 |
-- |
2025-05-01 |
|
GLM-4-32B-0414: New MIT-licensed SOTA LLM from Zhipu AI
|
3 |
-- |
2025-04-15 |
|
Xiaomi MiMo
|
3 |
-- |
2025-04-30 |
|
Qwen3 235B (MoE with 128 experts)
|
3 |
-- |
2025-04-28 |
|
Dia 1.6B – Nari Text-to-Speech Synthesis
|
3 |
-- |
2025-04-24 |
|
Microsoft/MAI-DS-R1, DeepSeek R1 Post-Trained by Microsoft
|
3 |
-- |
2025-04-18 |
|
Yambda-5B – Industrial-scale music recommendation dataset
|
3 |
-- |
2025-06-04 |
|
Show HN: we released an open source, best-in-class medical reasoning model
|
3 |
-- |
2025-05-13 |
|
Understanding MCP Evals: Why Evals Matter for MCP
|
3 |
-- |
2025-06-06 |
|
Show HN: Ego-Dex Gradio App
|
3 |
-- |
2025-06-03 |
|
Hugging Face Courses
|
3 |
-- |
2025-05-27 |
|
Show HN: Tinker with Meta's "tokenizer-free" patcher
|
3 |
-- |
2025-05-21 |
|
Radiology explainer demo
|
3 |
-- |
2025-05-20 |
|
Memelang – a hybrid relational-graph query language
|
3 |
-- |
2025-05-17 |
|
Hugging Face Collaborates with Proxima Fusion on ML for Stellarator Optimization
|
3 |
-- |
2025-07-02 |
|
Largest in-person AV conversational dataset ever released
|
3 |
-- |
2025-06-27 |
|
Kimina-Prover: Applying Test-time RL Search on Large Formal Reasoning Models
|
3 |
-- |
2025-07-10 |
|
Show HN: 1.5B LLM routing model that aligns to preferences, not leaderboards
|
3 |
-- |
2025-07-17 |
|
Mistral Releases Voxtral: Open Source Speech Understanding Models (3B and 24B)
|
3 |
-- |
2025-07-15 |
|
CommaCarSegments: 3148 hours of raw CAN bus data from 230 different car …
|
3 |
-- |
2025-07-10 |
|
AnyCoder creates a demo for Qwen Image Edit Plus in 10mins
|
3 |
-- |
2025-09-22 |
|
I made WEBGEN-OSS-20B, a model that generates clean websites from your prompts
|
3 |
-- |
2025-09-13 |
|
Reasoning Traces from QA Pairs
|
3 |
-- |
2025-09-09 |
|
Welcome EmbeddingGemma, Google's new efficient embedding model
|
3 |
-- |
2025-09-04 |
|
Output Schema for CodeAct AI Agents: From Trial-and-Error to Predictive Planning
|
3 |
-- |
2025-08-31 |
|
WildChat-4.8M: 4.8M Real User–ChatGPT Conversations (Open Dataset)
|
3 |
-- |
2025-08-11 |
|
Break the quadratic wall of Transformer attention: WERSA, paper+code open source
|
3 |
-- |
2025-08-02 |
|
Qwen-Image-Edit-2509
|
3 |
-- |
2025-09-22 |
|
AI Spreadsheet Benchmark [pdf]
|
3 |
-- |
2025-09-22 |
|
FinePDFs Dataset
|
3 |
-- |
2025-09-15 |
|
TildeOpen-30B: European LLM Focused on Underrepresented Languages
|
3 |
-- |
2025-09-04 |
|
First vision language model built off Open AI GPT-OSS
|
3 |
-- |
2025-08-26 |
|
Seed-OSS: open-source LLM models by ByteDance
|
3 |
-- |
2025-08-22 |
|
From Zero to GPU: A Guide to Building and Scaling Production-Ready CUDA …
|
3 |
-- |
2025-08-20 |
|
Jan-v1: Advanced Agentic Language Model
|
3 |
-- |
2025-08-12 |
|
NextCoder by Microsoft — LLM performing on par with GPT-4o on complex …
|
3 |
-- |
2025-08-08 |
|
OpenReasoning-Nemotron by Nvidia: state-of-the-art distilled reasoning models
|
3 |
-- |
2025-08-08 |
|
Accelerate ND-Parallel: A Guide to Efficient Multi-GPU Training
|
3 |
-- |
2025-08-08 |
|
Llama 3 8B Instruct quantized with GPTQ to fit in 10gb vRAM
|
2 |
-- |
2024-04-19 |
|
Try Qwen2.5-Coder-32B on HuggingChat
|
2 |
-- |
2024-11-12 |
|
An orthogonalized AI to introduce an unengaged melancholic style
|
2 |
-- |
2024-06-13 |
|
Pearl-7B-slerp, an xtraordinary 7B model for maths
|
2 |
-- |
2024-02-05 |
|
Duckdb-nsql: 7B parameter text-to-SQL model by MotherDuck and Numbers Station
|
2 |
-- |
2024-01-28 |
|
7B model from Snorkel tops Alpaca Eval 2.0 leaderboard
|
2 |
-- |
2024-01-24 |
|
Run Deepseek Coder LLM locally
|
2 |
-- |
2023-12-03 |
|
Releasing Swift Transformers: Run On-Device LLMs in Apple Devices
|
2 |
-- |
2023-08-08 |
|
Stable Diffusion Bias Explorer
|
2 |
-- |
2022-11-09 |
|
LongVU – New Video LLM from Meta
|
2 |
-- |
2024-10-24 |
|
Hacker News Comments Dataset
|
2 |
-- |
2024-10-11 |
|
HuggingFace Accelerate 1.0.0
|
2 |
-- |
2024-10-07 |
|
Mistral-Small-Instruct-2409
|
2 |
-- |
2024-09-17 |
|
HuggingChat: Chat with Llama 3.1 (70B and 405B)
|
2 |
-- |
2024-07-23 |
|
Ocean Biodiversity Information System on Hugging Face
|
2 |
-- |
2024-07-21 |
|
CommonCanvas image generation from CC-licensed images – models, dataset released
|
2 |
-- |
2024-06-07 |
|
Show HN: PodGen generate podcasts on any topic
|
2 |
-- |
2024-06-01 |
|
Meteor: Mamba-Based Traversal of Rationale for Large Language and Vision Models
|
2 |
-- |
2024-05-28 |
|
The Waifu Research Department
|
2 |
-- |
2024-05-16 |
|
Yi-1.5 LLM Models Released
|
2 |
-- |
2024-05-12 |
|
Fietje: An open and efficient LLM for Dutch
|
2 |
-- |
2024-05-02 |
|
Simple Multimodal LLM from Scratch
|
2 |
-- |
2024-04-23 |
|
Stability Releases Code Instruct 3B
|
2 |
-- |
2024-04-02 |
|
Mistral 7B v0.2
|
2 |
-- |
2024-04-01 |
|
PolarsBot, a New HuggingChat Assistant
|
2 |
-- |
2024-03-25 |
|
Easy and low cost model training on HF "DGX cloud"
|
2 |
-- |
2024-03-19 |
|
Pearl-7B-0211 LLM now exceeds 75 in the average score of the HF's …
|
2 |
-- |
2024-02-19 |
|
LLMs can learn useful guidelines from their own mistakes
|
2 |
-- |
2024-02-12 |
|
Pearl-7B-0210-dare now sits next to the best 7Bs on HF Leaderboard
|
2 |
-- |
2024-02-11 |
|
Aanaphi-2 3B
|
2 |
-- |
2024-02-09 |
|
Playground for Hugging Face Models
|
2 |
-- |
2024-02-05 |
|
Hallucinations Leaderboard
|
2 |
-- |
2024-01-29 |
|
Fine-tune Wav2Vec2-BERT for low resource speech recognition
|
2 |
-- |
2024-01-23 |
|
InstantID Demo: Zero-Shot Identity-Preserving Generation in Seconds
|
2 |
-- |
2024-01-22 |
|
Yayi2-30B-Llama
|
2 |
-- |
2024-01-01 |
|
Mixtral_7Bx2_MoE
|
2 |
-- |
2023-12-24 |
|
Universal AnglE Sentence Embedding: New SOTA on MTEB Leaderboard
|
2 |
-- |
2023-12-05 |
|
Non-engineers guide: Train a LLaMA 2 chatbot
|
2 |
-- |
2023-12-02 |
|
AutoTrain: (not just)LLM finetuning without code and infra
|
2 |
-- |
2023-11-23 |
|
How do you think LLM inference on CPUs?
|
2 |
-- |
2023-11-03 |
|
State-of-the-Art Ember embedding model for retrieval augmented generation
|
2 |
-- |
2023-10-20 |
|
Large Language Models as Analogical Reasoners
|
2 |
-- |
2023-10-05 |
|
QR Code Monster
|
2 |
-- |
2023-10-02 |
|
CausalLM is not optimal for in-context learning
|
2 |
-- |
2023-08-15 |
|
Count tokens used by GPT-4 and Llama for large texts (> 50k …
|
2 |
-- |
2023-08-05 |
|
Apply ControlNet to a Video
|
2 |
-- |
2023-08-01 |
|
Making real-time ML-powered web games with Transformers.js
|
2 |
-- |
2023-07-05 |
|
LLaMA: Large Language Model Meta AI
|
2 |
-- |
2023-03-17 |
|
Small Stable Diffusion
|
2 |
-- |
2023-01-19 |
|
Dreambooth training UI for training a model for less than US$0.80
|
2 |
-- |
2022-12-01 |
|
Stable Diffusion: Generating One Image a Second
|
2 |
-- |
2022-10-15 |
|
VToonify Web Demo for Portrait Video Style Transfer
|
2 |
-- |
2022-10-04 |
|
Pixtral-Large-Instruct-2411
|
2 |
-- |
2024-11-18 |
|
FLUX.1-Dev LoRA Outfit Generator by TryOn Labs
|
2 |
-- |
2024-11-06 |
|
Contextual Document Embeddings
|
2 |
-- |
2024-11-01 |
|
Code a Simple RAG from Scratch – Hugging Face Community Article
|
2 |
-- |
2024-10-30 |
|
OmniParser for Pure Vision Based GUI Agent
|
2 |
-- |
2024-10-25 |
|
Hugs – Scale Your AI with Open Models
|
2 |
-- |
2024-10-23 |
|
Wpaigpt-SQL-01: text-to-SQL model designed for WordPress and WordPress plugins
|
2 |
-- |
2024-10-23 |
|
Pickle Scanning
|
2 |
-- |
2024-10-23 |
|
New Video Generation Model:Allegro
|
2 |
-- |
2024-10-22 |
|
TxT360
|
2 |
-- |
2024-10-18 |
|
Dataset About Where 30k+ Startups Trend
|
2 |
-- |
2024-10-18 |
|
Nvidia Nemotron
|
2 |
-- |
2024-10-17 |
|
Fixing Gradient Accumulation
|
2 |
-- |
2024-10-16 |
|
Animate-X: Universal Character Image Animation with Enhanced Motion
|
2 |
-- |
2024-10-15 |
|
SOTA Open Source Text to Video Model
|
2 |
-- |
2024-10-14 |
|
Exploring the Daily Papers Page on Hugging Face
|
2 |
-- |
2024-09-24 |
|
Multilingual MMLU Dataset from OpenAI (OpenAI/Mmmlu)
|
2 |
-- |
2024-09-23 |
|
Recreating o1 at Home with Role-Play LLMs
|
2 |
-- |
2024-09-21 |
|
FineVideo: Annotated YouTube Dataset by HuggingFace
|
2 |
-- |
2024-09-12 |
|
Remove Background by Text
|
2 |
-- |
2024-09-12 |
|
Labeled Image generation using Meta Llama 3.5
|
2 |
-- |
2024-08-31 |
|
Scaling robotics datasets with video encoding
|
2 |
-- |
2024-08-30 |
|
New FashionCLIP and SigLIP Classification Demo
|
2 |
-- |
2024-08-28 |
|
Mozilla/TriLM-Llamafile · Hugging Face
|
2 |
-- |
2024-08-26 |
|
Play: How random can a human brain truly be?
|
2 |
-- |
2024-08-24 |
|
FLUX.1 [Schnell] – a Hugging Face Space by black-forest-labs
|
2 |
-- |
2024-08-21 |
|
Flux Dev 1 model that creates half_illustration images
|
2 |
-- |
2024-08-21 |
|
LLMs as Image Generators with Canonical Codec Representations
|
2 |
-- |
2024-08-19 |
|
Instant in-browser demo of SmolLM
|
2 |
-- |
2024-08-18 |
|
Marqo-FashionCLIP: New Embedding Model for Fashion
|
2 |
-- |
2024-08-14 |
|
A Large-Scale Multimodal Dataset with Multigranular Annotations for Medicine
|
2 |
-- |
2024-08-07 |
|
Generate and Export Segmentation Masks Using Meta's SAMv2
|
2 |
-- |
2024-07-31 |
|
HuggingChat: Chat with Llama 3.1 405B
|
2 |
-- |
2024-07-25 |
|
Meta-Llama-3.1-405B
|
2 |
-- |
2024-07-23 |
|
Apple's DCLM model shares data&training code with weights
|
2 |
-- |
2024-07-20 |
|
Predicting Multiplication with GPT-2
|
2 |
-- |
2024-07-20 |
|
Qwen2 Technical Report
|
2 |
-- |
2024-07-16 |
|
Gemma-2-27B-it llamafile
|
2 |
-- |
2024-07-03 |
|
OpenRAIL: Towards open and responsible AI licensing frameworks (2022)
|
2 |
-- |
2024-07-03 |
|
New LLM Agent writing actions in Python code tops the GAIA agent …
|
2 |
-- |
2024-07-01 |
|
Stable Diffusion 3 Medium Online Demo, Free
|
2 |
-- |
2024-06-12 |
|
To Believe or Not to Believe Your LLM
|
2 |
-- |
2024-06-11 |
|
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-Modal LLMs
|
2 |
-- |
2024-06-04 |
|
Map-Neo: Highly Capable and Transparent Bilingual Large Language Model Series
|
2 |
-- |
2024-05-31 |
|
Training and Finetuning Embedding Models with Sentence Transformers v3
|
2 |
-- |
2024-05-30 |
|
ChatTTS – open-source TTS model designed specifically for dialogue scenario
|
2 |
-- |
2024-05-29 |
|
Matryoshka Multimodal Models
|
2 |
-- |
2024-05-28 |
|
Aya 23: Open Weight Releases to Further Multilingual Progress
|
2 |
-- |
2024-05-28 |
|
HuggingFace Hub Incident Post Mortem
|
2 |
-- |
2024-05-24 |
|
Cohere Updates Weights for Aya
|
2 |
-- |
2024-05-23 |
|
Hugging Face on AMD Instinct MI300 GPU
|
2 |
-- |
2024-05-23 |
|
Show HN: Generate a Quiz from Any Url
|
2 |
-- |
2024-05-17 |
|
Show HN: EmuBert – the first open encoder model for Australian law
|
2 |
-- |
2024-05-14 |
|
New Yi 1.5 models under Apache 2.0
|
2 |
-- |
2024-05-12 |
|
Building Cost-Efficient Enterprise RAG Applications
|
2 |
-- |
2024-05-10 |
|
Google codegemma-1.1-7B-it
|
2 |
-- |
2024-05-03 |
|
Introduction to Matryoshka Embedding Models
|
2 |
-- |
2024-05-03 |
|
Iterative Reasoning Preference Optimization
|
2 |
-- |
2024-05-02 |
|
GPT-2
|
2 |
-- |
2024-05-01 |
|
Fine-tune Llama 3 with ORPO
|
2 |
-- |
2024-04-23 |
|
In-browser text-to-music generation using musicgen-small
|
2 |
-- |
2024-04-20 |
|
Compression Represents Intelligence Linearly
|
2 |
-- |
2024-04-16 |
|
Bringing serverless GPU inference to Hugging Face users
|
2 |
-- |
2024-04-16 |
|
From Words to Numbers: Your LLM Is a Capable Regressor
|
2 |
-- |
2024-04-12 |
|
Zephyr-orpo-141B-A35B: Mixtral 8x22B fine-tune by HuggingFace
|
2 |
-- |
2024-04-11 |
|
TinyTimeMixer: Open-source time series LLM by IBM
|
2 |
-- |
2024-04-09 |
|
Visual Autoregressive Modeling: Scalable Image Generation W NextScale Prediction
|
2 |
-- |
2024-04-05 |
|
Command R+
|
2 |
-- |
2024-04-04 |
|
Demo of Moondream2 vision language model running in browser
|
2 |
-- |
2024-04-03 |
|
Mini-Jamba
|
2 |
-- |
2024-04-01 |
|
Transformer-Lite: High-Efficiency Deployment of LLMs on Mobile Phone GPUs
|
2 |
-- |
2024-04-01 |
|
The Era of 1-Bit LLMs: All Large Language Models Are in 1.58 …
|
2 |
-- |
2024-03-25 |
|
Cosmopedia: How to create large-scale synthetic data for pre-training
|
2 |
-- |
2024-03-21 |
|
Playground-v2.5-1024px-Aesthetic
|
2 |
-- |
2024-03-16 |
|
Gemini 1.5: Unlocking multimodal understanding across tokens of context
|
2 |
-- |
2024-03-15 |
|
Better RAG 1: Advanced Basics
|
2 |
-- |
2024-03-15 |
|
Cerebrum 7B – Mistral fine-tune created specifically for reasoning tasks
|
2 |
-- |
2024-03-13 |
|
LLM Red-Teaming Resistance Leaderboard
|
2 |
-- |
2024-03-01 |
|
Show HN: Visualize how you split your document into chunks for RAG …
|
2 |
-- |
2024-02-27 |
|
From OpenAI to Open LLMs with Messages API on Hugging Face
|
2 |
-- |
2024-02-23 |
|
C4: colossal cleaned version of Common Crawl's web crawl corpus
|
2 |
-- |
2024-02-21 |
|
Constitutional AI with Open LLMs
|
2 |
-- |
2024-02-01 |
|
Show HN: 2x Faster Stable Diffusion Models on Hugging Face with Pruna …
|
2 |
-- |
2024-01-31 |
|
AMUSEd: Efficient Text-to-Image Generation
|
2 |
-- |
2024-01-29 |
|
Minillama – 4.1 MB LLM for testing
|
2 |
-- |
2024-01-20 |
|
StableLM 2 Zephyr 1.6B
|
2 |
-- |
2024-01-20 |
|
Local vector embeddings index for analyzing ArXiv papers
|
2 |
-- |
2024-01-17 |
|
Stable Zero123 Model Weights get Released. Text to 3D and image to …
|
2 |
-- |
2024-01-15 |
|
Make LLM Fine-Tuning 2x Faster with Unsloth and HuggingFace TRL
|
2 |
-- |
2024-01-10 |
|
OpenChat-3.5 Update 0106: ChatGPT-level performances accessible locally
|
2 |
-- |
2024-01-10 |
|
Revolutionizing AI with Audio Classification via Computer Vision
|
2 |
-- |
2024-01-02 |
|
Chatglm3-6B-32k
|
2 |
-- |
2023-12-29 |
|
DreaMoving: A Human Video Generation Framework Based on Diffusion Models
|
2 |
-- |
2023-12-28 |
|
Dream-Talk: Realistic Audio-Driven Single Image Talking Face Generation
|
2 |
-- |
2023-12-24 |
|
Time Is Encoded in the Weights of Finetuned Language Models
|
2 |
-- |
2023-12-22 |
|
2023, Year of Open LLMs
|
2 |
-- |
2023-12-19 |
|
Hugging Face releases Optimum-Nvidia to accelerate LLM inference
|
2 |
-- |
2023-12-07 |
|
Open LLM Leaderboard: DROP deep dive
|
2 |
-- |
2023-12-02 |
|
Starling-RM-7B-Alpha
|
2 |
-- |
2023-11-27 |
|
Intel: neural-chat-7B-v3-1
|
2 |
-- |
2023-11-16 |
|
Whisper Large v3
|
2 |
-- |
2023-11-09 |
|
MonadGPT – OS ChatGPT-like for the 17th century
|
2 |
-- |
2023-11-09 |
|
OpenHermes-2.5-Mistral-7B
|
2 |
-- |
2023-11-08 |
|
Yi-34B, 76.3 on MMLU, Apache 2.0
|
2 |
-- |
2023-11-04 |
|
Templates for Chat Models
|
2 |
-- |
2023-10-17 |
|
HF Shopify Image Background Replacement
|
2 |
-- |
2023-10-12 |
|
OpenWebMath, a dataset containing every math docs found on the internet
|
2 |
-- |
2023-10-11 |
|
Paper Page – NExT-GPT: Any-to-Any Multimodal LLM
|
2 |
-- |
2023-09-12 |
|
Using Machine Learning to Improve Language Metadata on the Hugging Face Hub
|
2 |
-- |
2023-09-12 |
|
Open ASR Leaderboard
|
2 |
-- |
2023-09-07 |
|
Show HN: A LLM pull reqeust review tool [feedback wanted]
|
2 |
-- |
2023-09-07 |
|
Technology Innovation Institute Releases Falcon 180B LLM
|
2 |
-- |
2023-09-06 |
|
Hugging Face Tutorial for Unity RL Agents
|
2 |
-- |
2023-08-31 |
|
Dolma: The Largest Open Dataset For Training Language Models
|
2 |
-- |
2023-08-24 |
|
WizardMath: Empowering Math Reasoning for LLM via Reinforced Evol-Instruct
|
2 |
-- |
2023-08-15 |
|
Hugging Face Launches Tools for Running LLMs on Apple Devices
|
2 |
-- |
2023-08-09 |
|
Open sourcing OpenAI’s function calling
|
2 |
-- |
2023-08-08 |
|
Autotrain – Create powerful AI models without code
|
2 |
-- |
2023-07-30 |
|
Understanding Embeddings
|
2 |
-- |
2023-07-28 |
|
Scaling TransNormer to 175B Parameters
|
2 |
-- |
2023-07-28 |
|
Llama 2 is here – get it on Hugging Face
|
2 |
-- |
2023-07-19 |
|
Building an AI WebTV
|
2 |
-- |
2023-07-18 |
|
Open-Source Text Generation and LLM Ecosystem at Hugging Face
|
2 |
-- |
2023-07-17 |
|
OpenOrca-Preview1
|
2 |
-- |
2023-07-12 |
|
Large Language Models can complete complex non linguistic patterns in context
|
2 |
-- |
2023-07-11 |
|
Whisper Web: Speech recognition in the web browser
|
2 |
-- |
2023-07-10 |
|
Chat with Falcon-7B-instruct demo
|
2 |
-- |
2023-07-08 |
|
OpenChat: Less is More for Open-source Models
|
2 |
-- |
2023-07-06 |
|
Can foundation models label data like humans?
|
2 |
-- |
2023-07-05 |
|
Are Text-to-image models biased?
|
2 |
-- |
2023-07-03 |
|
Orca: Progressive Learning from Complex Explanation Traces of GPT-4
|
2 |
-- |
2023-07-01 |
|
Can foundation models label data like humans?
|
2 |
-- |
2023-06-30 |
|
A Synthetic Dataset of Bodies Exhibiting Detailed Lifelike Animated Motion
|
2 |
-- |
2023-06-30 |
|
Hugging Face – Transformers Agents 4.30 with local agents
|
2 |
-- |
2023-06-28 |
|
DragGan – Interactive Point-Based Manipulation on the Generative Image Manifold
|
2 |
-- |
2023-06-26 |
|
QR Code Conditioned ControlNet Models for Stable Diffusion 1.5 and 2.1
|
2 |
-- |
2023-06-16 |
|
Cluster and Visualise 100K Wines by Tasting Notes with T-SNE
|
2 |
-- |
2023-06-11 |
|
Hugging Face and IBM partner on watsonx.ai, next-gen enterprise studio for AI
|
2 |
-- |
2023-05-28 |
|
HuggingFace Demo: DragGAN
|
2 |
-- |
2023-05-26 |
|
Audit shows that safetensors is safe and ready to become the default
|
2 |
-- |
2023-05-23 |
|
A Dive into Text-to-Video Models
|
2 |
-- |
2023-05-15 |
|
HuberChat, a Chatbot trained on HubermanLab podcast (OpenAI key required)
|
2 |
-- |
2023-05-10 |
|
Demo: Code Completion with replit-code-v1-3B
|
2 |
-- |
2023-05-03 |
|
RLHF – Hugging Face Course
|
2 |
-- |
2023-04-27 |
|
Ekimetrics launches a “ChatGPT” dedicated to climate
|
2 |
-- |
2023-04-07 |
|
Alpaca GarbageCollector – Curating high-quality data for open-source LLMs
|
2 |
-- |
2023-04-04 |
|
Text2Video-Zero
|
2 |
-- |
2023-03-26 |
|
Train your own ControlNet models with diffusers
|
2 |
-- |
2023-03-24 |
|
Open source models for various Machine Learning tasks
|
2 |
-- |
2023-03-08 |
|
Ultra Fast ControlNet with Hugging Face Diffusers
|
2 |
-- |
2023-03-03 |
|
Using Stable Diffusion with Core ML on Apple Silicon
|
2 |
-- |
2023-02-22 |
|
HuggingFace/Transformers-Stats
|
2 |
-- |
2023-02-20 |
|
Playable Demo for MarioGPT: Open-Ended Text2Level Generation Through LLMs
|
2 |
-- |
2023-02-18 |
|
Faster Training and Inference: Habana Gaudi -2 vs. Nvidia A100 80GB
|
2 |
-- |
2023-02-16 |
|
Speech Synthesis, Recognition, and More with SpeechT5
|
2 |
-- |
2023-02-09 |
|
Threat actors using HuggingFace to deliver malware
|
2 |
-- |
2023-02-07 |
|
Generating Human Motion from Textual Descriptions (T2M-GPT)
|
2 |
-- |
2023-01-31 |
|
AI for Game Development: 3D Asset Generation
|
2 |
-- |
2023-01-20 |
|
Show HN: ML Q&A – Get answers to questions about ML frameworks
|
2 |
-- |
2023-01-05 |
|
Probabilistic Time Series Forecasting with Transformers
|
2 |
-- |
2022-12-02 |
|
Fine-Tune Whisper for Multilingual ASR with Transformers
|
2 |
-- |
2022-11-23 |
|
Ask a question, YouTube and OpenAI Whisper will try to answer
|
2 |
-- |
2022-10-28 |
|
Show HN: Ask YouTube – search for specific answers in videos
|
2 |
-- |
2022-10-28 |
|
New Google big language model Flan-T5 available on HuggingFace
|
2 |
-- |
2022-10-22 |
|
The Annotated Diffusion Model
|
2 |
-- |
2022-09-13 |
|
Text2Human: Text-Driven Controllable Human Image Generation
|
2 |
-- |
2022-08-04 |
|
Highly Accurate Dichotomous Image Segmentation
|
2 |
-- |
2022-07-31 |
|
The Technology Behind BLOOM Training
|
2 |
-- |
2022-07-23 |
|
BLOOM Language Model
|
2 |
-- |
2022-07-04 |
|
GPT4-Chan – Conditions for Availability
|
2 |
-- |
2022-06-24 |
|
Hugging Face Hub: discover and share ML models, datasets, and demos
|
2 |
-- |
2022-06-01 |
|
Decision Transformers on Hugging Face
|
2 |
-- |
2022-06-01 |
|
Mask Transfiner for High-Quality Instance Segmentation
|
2 |
-- |
2022-04-17 |
|
MultiMAE: Multi-modal Multi-task Masked Autoencoders
|
2 |
-- |
2022-04-16 |
|
Self-Distilled StyleGAN: Towards Generation from Internet Photos Gradio Demo
|
2 |
-- |
2022-04-05 |
|
CVPR2022 Pastiche Master: Exemplar-Based High-Resolution Portrait Style Transfer
|
2 |
-- |
2022-03-24 |
|
Show HN: HF-BERTopic – Transformer based topic modeling in the browser
|
2 |
-- |
2022-02-02 |
|
Turn a Photo into an Animation
|
2 |
-- |
2022-01-29 |
|
DeepPrivacy: GANs for Face Anonymization
|
2 |
-- |
2022-01-24 |
|
Show HN: HN-KeyBERT: AI KeyPhrase extraction in the browser
|
2 |
-- |
2022-01-24 |
|
Similarity search for current Hacker News front page titles
|
2 |
-- |
2022-01-23 |
|
HuggingFace on Sheets
|
2 |
-- |
2025-03-24 |
|
OpenGPT-X
|
2 |
-- |
2024-11-26 |
|
Show HN: AI Hackathon_ Prize 20K USD '1-Min Creative Innovation with AI'
|
2 |
-- |
2024-11-28 |
|
The Lichess database is now on Hugging Face
|
2 |
-- |
2024-12-06 |
|
LLM Comparison/Test: 25 SOTA LLMs (Including QwQ) Through 59 MMLU-Pro CS Runs
|
2 |
-- |
2024-12-05 |
|
Releasing: A dataset of two million Bluesky posts
|
2 |
-- |
2024-11-27 |
|
Just launched MilkDropLM model using 32B parameters
|
2 |
-- |
2024-12-20 |
|
FineMath: the best public math pre-training dataset
|
2 |
-- |
2024-12-19 |
|
I-JEPA Hugginface
|
2 |
-- |
2024-12-09 |
|
FineWeb2 dataset: A sparkling update with 1000s of languages
|
2 |
-- |
2024-12-08 |
|
Vdr-2B-multi-v1 a multilingual embedding model for visual document retrieval
|
2 |
-- |
2025-01-10 |
|
Show HN: We collected detailed annotations for text-to-image generation
|
2 |
-- |
2025-01-10 |
|
Hugging Face Smolagents
|
2 |
-- |
2025-01-05 |
|
Hugging Face advocates for Code Agents: agents that write tool calls as …
|
2 |
-- |
2025-01-02 |
|
ModernBERT: Encoder-only Transformer Model Strictly Improving on past work
|
2 |
-- |
2025-01-01 |
|
Polish linguistic and cultural competency benchmark for LLMs
|
2 |
-- |
2024-12-31 |
|
Flex.1-Alpha – A new modded Flux model that can properly handle being …
|
2 |
-- |
2025-01-19 |
|
OpenAI o3 just scored 99.8% on CodeForces using brute-force
|
2 |
-- |
2025-02-12 |
|
FinePersonas
|
2 |
-- |
2025-02-10 |
|
#9: Does AI Remember? The Role of Memory in Agentic Workflows
|
2 |
-- |
2025-02-03 |
|
Mistral-Small-24B-Base-2501
|
2 |
-- |
2025-01-30 |
|
Generate Images, Chat with PDF in WebGPU via DeepSeek Janus Pro 1B
|
2 |
-- |
2025-01-28 |
|
The state of open video generation models
|
2 |
-- |
2025-01-28 |
|
Bespoke-Stratos-17k: Open Reasoning Dataset by Distilling DeepSeek-R1
|
2 |
-- |
2025-01-27 |
|
DeepSeek-R1 WebGPU
|
2 |
-- |
2025-01-22 |
|
FastRTC: The Real-Time Communication Library for Python
|
2 |
-- |
2025-02-25 |
|
Show HN: Roast Any Website with AI
|
2 |
-- |
2025-02-25 |
|
SWE-Lancer: Can LLMs Earn $1M from Real-World Freelance Software Engineering?
|
2 |
-- |
2025-02-18 |
|
Desklib AI Detector Ranks No 1 on Raid Benchmark for AI Detection
|
2 |
-- |
2025-02-17 |
|
Forget What You Know about LLMs Evaluations – LLMs Are Like a …
|
2 |
-- |
2025-02-13 |
|
JFK Assassination Records Dataset on Hugging Face
|
2 |
-- |
2025-04-09 |
|
Show HN: My progress towards building a robotics training dataset
|
2 |
-- |
2025-03-18 |
|
HOGWILD! Inference – parallel LLM chain-of-thought with shared attention
|
2 |
-- |
2025-04-09 |
|
Llama-4 Model-Based Agentic AI System HuggingFace Released
|
2 |
-- |
2025-04-06 |
|
Llama 3.2 from-scratch implementation focused on code readability
|
2 |
-- |
2025-04-01 |
|
deepsite
|
2 |
-- |
2025-03-31 |
|
SuperBPE: Space Travel for Language Models
|
2 |
-- |
2025-03-29 |
|
Gemma3 on Hugging Face
|
2 |
-- |
2025-03-26 |
|
Open-source LLM beats OpenAI o1 and DeepSeek-R1 for PyTorch-to-Triton codegen
|
2 |
-- |
2025-03-19 |
|
Cohere: Command A (111B Open Weights Model)
|
2 |
-- |
2025-03-14 |
|
Open Dataset: Vehicle Accidents
|
2 |
-- |
2025-03-13 |
|
Show HN: TTS Arena V2
|
2 |
-- |
2025-05-02 |
|
WebThinker: Empowering Large Reasoning Models with Deep Research Capability
|
2 |
-- |
2025-05-01 |
|
MamayLM: An Efficient Ukrainian LLM
|
2 |
-- |
2025-04-23 |
|
Show HN: AEE – An Open-Source Engine That Evaluates Truth and Bias …
|
2 |
-- |
2025-04-13 |
|
Magi-1: Autoregressive Video Generation at Scale
|
2 |
-- |
2025-05-06 |
|
The 4 Things the Qwen-3's Chat Template Teaches Us
|
2 |
-- |
2025-05-02 |
|
Show HN: A synthetic text dataset to train tiny language models on
|
2 |
-- |
2025-05-01 |
|
Phi-4-Reasoning
|
2 |
-- |
2025-05-01 |
|
FantasyTalking: Realistic Talking Portrait Generation
|
2 |
-- |
2025-04-30 |
|
Neural Network Visualizer
|
2 |
-- |
2025-04-29 |
|
The Bitter Lesson Learned from 2k Multilingual Benchmarks
|
2 |
-- |
2025-04-23 |
|
ThinkFlow: The Revolutionary Platform That Gives LLMs the Power to Think
|
2 |
-- |
2025-04-19 |
|
Microsoft BitNet 1.58bit LLM 2B4T released
|
2 |
-- |
2025-04-16 |
|
SOTA Model in 8B Size?
|
2 |
-- |
2025-05-29 |
|
TiRex Leads Gift Eval
|
2 |
-- |
2025-06-02 |
|
How do AI political biases differ between English and French?
|
2 |
-- |
2025-05-21 |
|
KernelLLM – Meta's new 8B SotA model
|
2 |
-- |
2025-05-19 |
|
Wan: Open and Advanced Large-Scale Video Generative Models
|
2 |
-- |
2025-05-14 |
|
Embedding Benchmark for Retrieval
|
2 |
-- |
2025-06-11 |
|
MiniCPM4 – a series of open multimodal models for edge inference
|
2 |
-- |
2025-06-10 |
|
The Qwen3 Embedding Model
|
2 |
-- |
2025-06-06 |
|
Tiny Agents in Python: an MCP-powered agent in ~70 lines of code
|
2 |
-- |
2025-05-23 |
|
Show HN: 2.4x faster baai/bge-M3
|
2 |
-- |
2025-05-18 |
|
Vision Language Models (Better, Faster, Stronger)
|
2 |
-- |
2025-05-13 |
|
Building and better understanding vision-language models (2024)
|
2 |
-- |
2025-05-10 |
|
FLUX Kontext Dev Ultra Fast Live
|
2 |
-- |
2025-06-26 |
|
Veena – open-source TTS for Indian Languages
|
2 |
-- |
2025-06-25 |
|
Metalorian: Generate Heavy Metal-Binding Peptides with Diffusion Sampling
|
2 |
-- |
2025-07-12 |
|
Kimi-K2-Base
|
2 |
-- |
2025-07-11 |
|
Building the Hugging Face MCP Server
|
2 |
-- |
2025-07-10 |
|
A Survey on Latent Reasoning
|
2 |
-- |
2025-07-10 |
|
Skywork-R1V3-38B open-source multimodal reasoning model
|
2 |
-- |
2025-07-08 |
|
HuggingChat is shutting down (for now)
|
2 |
-- |
2025-07-04 |
|
Qwen3Guard: Real-Time Safety for Your Token Stream
|
2 |
-- |
2025-09-24 |
|
K2-Think: A Parameter-Efficient Reasoning System
|
2 |
-- |
2025-09-13 |
|
Environments Hub: Your Language Model needs better (open) environments to learn
|
2 |
-- |
2025-09-05 |
|
Pivotal Token Search (PTS): Targeting Critical Decision Points in LLM Training
|
2 |
-- |
2025-08-18 |
|
Voxtral WebGPU
|
2 |
-- |
2025-07-25 |
|
Show HN: kulyk-uk-en and kulyk-en-uk
|
2 |
-- |
2025-07-22 |
|
Show HN: KaniTTS – Ultra Fast and Expressive TTS Model
|
2 |
-- |
2025-09-22 |
|
N-Atlas V1
|
2 |
-- |
2025-09-21 |
|
Granite docling 258M: a small multimodal model for efficient document conversion
|
2 |
-- |
2025-09-17 |
|
Statistical Methods in Generative AI
|
2 |
-- |
2025-09-16 |
|
EmbeddingGemma is a 300M parameter, open embedding model from Google
|
2 |
-- |
2025-09-05 |
|
Swiss AI Initiative
|
2 |
-- |
2025-09-02 |
|
Apertus LLM
|
2 |
-- |
2025-09-02 |
|
Hugging Face speadsheet tool: AI Sheets
|
2 |
-- |
2025-09-01 |
|
A Novel Pretrained Tokenizer-Free LLM Architecture
|
2 |
-- |
2025-08-29 |
|
MiniCPM-V 4.5: GPT-4o Level MLLM for Image and Video Understanding on Your …
|
2 |
-- |
2025-08-26 |
|
NASA and IBM release open source model on Hugging Face to predict …
|
2 |
-- |
2025-08-20 |
|
Tokenizers
|
2 |
-- |
2025-08-17 |
|
FormulaOne: A reasoning benchmark that all models score 0% on
|
2 |
-- |
2025-08-14 |
|
dots.ocr: Multilingual Document Layout Parsing in a Single Vision-Language Model
|
2 |
-- |
2025-08-06 |
|
Qwen3-30B-A3B-Thinking-2507 has been released
|
2 |
-- |
2025-07-31 |
|
Intern-S1: A 241B parameter open-source MoE multimodal model
|
2 |
-- |
2025-07-28 |
|
Creating custom kernels for the AMD MI300
|
2 |
-- |
2025-07-25 |
|
Fast LoRA Inference for Flux with Diffusers and PEFT
|
2 |
-- |
2025-07-24 |
|
Nvidia parakeet-tdt-0.6B-v2
|
2 |
-- |
2025-07-22 |
|
How to Run a Hugging Face Model in Jax (Part 1)
|
2 |
-- |
2025-07-20 |
|
Show HN: Chimera-QxD-BMM-Qwen2-l22_28-alphaqd-1.5B-f16
|
2 |
-- |
2025-07-19 |
|
Show HN: Embedding model for PDF page retrieval
|
1 |
-- |
2024-08-08 |
|
Nvidia Just Published ChatQA 1.5, a Llama3 QA/RAG Finetune
|
1 |
-- |
2024-05-02 |
|
Show HN: Elon Musk's Tweet Classifier
|
1 |
-- |
2022-04-30 |
|
Get Insulted by AI
|
1 |
-- |
2024-02-25 |
|
Launch of F.ai Fuzer v0.1 on HuggingFace Space using Gradio
|
1 |
-- |
2024-07-29 |
|
With LLMs we can create an open-source Library of Alexandria
|
1 |
-- |
2023-09-28 |
|
Show HN: Find Your Celebrity Lookalike (With AI)
|
1 |
-- |
2023-01-04 |
|
Stable difussion trained with “El Risitas” dataset
|
1 |
-- |
2022-10-27 |
|
SmolLM2: The new, best, and open small language model
|
1 |
-- |
2024-11-01 |
|
The Romulus model series has been released on Hugging Face
|
1 |
-- |
2024-09-11 |
|
I added context data to the TruthfulQA dataset
|
1 |
-- |
2024-08-10 |
|
Chinese AI Community: open-source Heatmap
|
1 |
-- |
2024-07-31 |
|
Multi-token prediction models and baselines
|
1 |
-- |
2024-07-04 |
|
Stupid Filter Corpus (2007)
|
1 |
-- |
2024-05-24 |
|
MMLU-Pro: Advanced edition of MMLU & new Leaderboard
|
1 |
-- |
2024-05-15 |
|
Ratchet and Phi 3
|
1 |
-- |
2024-05-01 |
|
Snowflake Arctic Instruct Open LLM
|
1 |
-- |
2024-04-24 |
|
LegalKit Retrieval, binary Search with int8 Rescoring through French legal codes
|
1 |
-- |
2024-04-08 |
|
MANATEE(lm): Market Analysis based on language model architectures
|
1 |
-- |
2024-03-20 |
|
Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-Tuning on a …
|
1 |
-- |
2024-03-13 |
|
Serverless Image Similarity with Upstash Vector and HuggingFace Spaces
|
1 |
-- |
2024-02-02 |
|
Dutch Drug-Related Text Classification Model by NOS
|
1 |
-- |
2024-01-25 |
|
Implement Fractional GPUs in Kubernetes to save upto 50% cost
|
1 |
-- |
2024-01-22 |
|
The next person that says textual modalities gets it
|
1 |
-- |
2024-01-10 |
|
LLaMA Pro: Progressive LLaMA with Block Expansion
|
1 |
-- |
2024-01-05 |
|
DiffMorpher – Using Diffusion Models for Image Morphing
|
1 |
-- |
2023-12-24 |
|
Tencent Announces AppAgent
|
1 |
-- |
2023-12-22 |
|
How Do Prompt Injection Scanners Perform? A Benchmark
|
1 |
-- |
2023-12-07 |
|
Show HN: ChatData – an open-source ChatGPT-like chatbot
|
1 |
-- |
2023-11-29 |
|
3D Gaussian Splat Viewer (top item)
|
1 |
-- |
2023-10-23 |
|
Who loves you Hacker News?
|
1 |
-- |
2023-10-12 |
|
Curious about Causality and Generative Models? Check Out This Demo
|
1 |
-- |
2023-07-26 |
|
Have You Tried AWS Inferentia2 for ML Deployments?
|
1 |
-- |
2023-07-16 |
|
Open Source LLM Inference DLC
|
1 |
-- |
2023-06-29 |
|
WizardCoder: Empowering Code Large Language Models with Evol-Instruct
|
1 |
-- |
2023-06-15 |
|
Text Embedding Benchmark (MTEB) Leaderboard
|
1 |
-- |
2023-02-20 |
|
Diffusion Models Live Event with Hugging Face
|
1 |
-- |
2022-11-25 |
|
Train a language model with Megatron-LM and convert it to Transformers
|
1 |
-- |
2022-09-13 |
|
Multilingual GPT model with 1.3B parameters trained on 25 languages
|
1 |
-- |
2022-05-01 |
|
Hugging Face Model Comparator Space Builder
|
1 |
-- |
2022-03-28 |
|
Halo: Open-Source Health Tracking with Wearables
|
1 |
-- |
2024-11-20 |
|
Releasing the largest multilingual open pretraining dataset
|
1 |
-- |
2024-11-14 |
|
Qwen 2.5 Coder: LLM model based on Qwen 2.5 architecture optimised for …
|
1 |
-- |
2024-11-12 |
|
Providing Open Investment Data – 25 years of data
|
1 |
-- |
2024-11-11 |
|
New Sota Text to Image
|
1 |
-- |
2024-10-31 |
|
Stable Diffusion 3.5 Medium
|
1 |
-- |
2024-10-29 |
|
Kolors Virtual Try-On in the Wild
|
1 |
-- |
2024-10-28 |
|
Google Shopping 10M Dataset: One of the Largest for Multimodal Product Retrieval
|
1 |
-- |
2024-10-23 |
|
Stable Diffusion 3.5-large released
|
1 |
-- |
2024-10-22 |
|
Transformers.js v3: WebGPU Support, New Models and Tasks, and More
|
1 |
-- |
2024-10-22 |
|
Allegro – New Open Source Text to Video Generator from Rhymes AI
|
1 |
-- |
2024-10-22 |
|
Distilabel Synthetic Data Generator on Hugging Face
|
1 |
-- |
2024-10-17 |
|
HF's Open LLM Leaderboard releases Comparator to drill down in LLM performance
|
1 |
-- |
2024-10-17 |
|
Show HN: A dataset of all HN submission texts (2006-2024) in Markdown
|
1 |
-- |
2024-10-13 |
|
Scaling AI-Based Data Processing with Hugging Face and Dask
|
1 |
-- |
2024-10-10 |
|
LLMs Know More Than They Show
|
1 |
-- |
2024-10-08 |
|
Document Similarity Search with ColPali
|
1 |
-- |
2024-09-29 |
|
Prithvi WxC: Foundation Model for Weather and Climate
|
1 |
-- |
2024-09-24 |
|
Show HN: Fusion-Guide: A Model for Generating Cot Reasoning and Guidance
|
1 |
-- |
2024-09-24 |
|
HN-Style HuggingFace Daily Papers
|
1 |
-- |
2024-09-22 |
|
Qwen2.5-Coder Technical Report
|
1 |
-- |
2024-09-21 |
|
Introducing Community Tools on HuggingChat
|
1 |
-- |
2024-09-20 |
|
InkubaLM-0.4B: Small language model for low-resource African Languages
|
1 |
-- |
2024-08-29 |
|
Diffusion models are real time game engines
|
1 |
-- |
2024-08-29 |
|
Everchanging Quest: Rogue-like game powered by LLMs
|
1 |
-- |
2024-08-21 |
|
xLSTM Model Trained on Music
|
1 |
-- |
2024-08-16 |
|
Qwen2-VL
|
1 |
-- |
2024-08-14 |
|
Scaling LLM Test-Time Compute More Effective Than Scaling Model Parameters
|
1 |
-- |
2024-08-07 |
|
Depth Compare – A Hugging Face space to compare different depth models
|
1 |
-- |
2024-07-29 |
|
Insilico Medicine on Hugging Face
|
1 |
-- |
2024-07-27 |
|
LAVE: Zero-Shot VQA Evaluation on Docmatix with LLMs
|
1 |
-- |
2024-07-26 |
|
Spreadsheetllm: Encoding Spreadsheets for Large Language Models
|
1 |
-- |
2024-07-24 |
|
Followgraph for Hugging Face
|
1 |
-- |
2024-07-23 |
|
Show HN: Variable-length (up to 47s) stereo audio at 44.1kHz from text …
|
1 |
-- |
2024-07-23 |
|
Scaling Diffusion Transformers to 16B Parameters
|
1 |
-- |
2024-07-19 |
|
DeepSeek v2 Chat (0628) released
|
1 |
-- |
2024-07-18 |
|
The Rise of Agentic Data Generation
|
1 |
-- |
2024-07-15 |
|
Fast SD3 Medium
|
1 |
-- |
2024-07-10 |
|
Agentic RAG: query reformulation and self-query
|
1 |
-- |
2024-07-08 |
|
Meta LLM Compiler
|
1 |
-- |
2024-06-29 |
|
Allegro-TI2V: an open source video generation model
|
1 |
-- |
2024-11-27 |
|
PR Puppet Sora
|
1 |
-- |
2024-11-27 |
|
Lightricks/LTX-Video – first real-time video generation model
|
1 |
-- |
2024-11-23 |
|
PaliGemma 2 – New vision language models by Google
|
1 |
-- |
2024-12-05 |
|
Open Source Developers Guide to the EU AI Act
|
1 |
-- |
2024-12-03 |
|
LM Studio using models from Hugging Face
|
1 |
-- |
2024-12-02 |
|
IC Light – Shade Generation Model
|
1 |
-- |
2024-12-02 |
|
ModernBERT
|
1 |
-- |
2024-12-20 |
|
Show HN: A ML powered text moderation model that outperforms Open AI
|
1 |
-- |
2024-12-14 |
|
Help Us Rank the Best Background Removal Tools
|
1 |
-- |
2024-12-11 |
|
I need your help to create brain-rot dataset
|
1 |
-- |
2024-12-08 |
|
Phi-4 GGUF
|
1 |
-- |
2024-12-14 |
|
HunyuanVideo and Diffusers Made Easy
|
1 |
-- |
2024-12-11 |
|
Show HN: An Agentic AI dataset for deepfake detection
|
1 |
-- |
2025-01-15 |
|
FP8 DeepSeek R1 Distilled LLMs for SGLang and VLLM
|
1 |
-- |
2025-01-29 |