Home / Companies / HuggingFace / Blog / Post Details
Content Deep Dive

We’re open-sourcing our text-to-image model and the process behind it

Blog post from HuggingFace

Post Details
Company
Date Published
Author
Jon Almazán, David Bertoin, and Roman
Word Count
1,110
Company Posts That Month
49
Language
-
Hacker News Points
-
Summary

Photoroom has open-sourced its text-to-image model, PRX, making it available under the Apache 2.0 license through 🤗 Diffusers, with the aim of providing both a robust model and a detailed resource on the training process. The model, which includes a 1.3 billion-parameter version trained on 32 H200 GPUs, is designed to produce high-quality images at resolutions up to 1024 pixels. The release is accompanied by a blog series detailing the training pipeline, including architecture choices, training techniques, and post-training methods, with more updates planned to cover further experiments and refinements. Photoroom encourages community involvement through their Discord server and is actively seeking contributions and feedback. The project showcases extensive experimentation with various architectures, VAEs, and training optimizations, and includes contributions from a diverse team of researchers and engineers.

Trends Found in this Post
Trend Post Mentions Total Month Mentions Posts Companies MoM
AI Model Fine-tuning 2 558 140 61 -27%
LLM 1 5,556 752 184 +14%
Reinforcement learning 1 293 55 27 +98%
Vector Search 1 1,303 288 128 -18%