Using Stable Diffusion and SAM to Modify Image Contents Zero Shot
Blog post from Roboflow
Recent advancements in large language models (LLMs) and foundational computer vision models have revolutionized image and video editing by enabling the use of text prompts for tasks like inpainting, outpainting, and generative fill. This tutorial introduces a method of creating a visual editor using open-source models such as Segment Anything Model (SAM), Stable Diffusion, and Grounding DINO, which together facilitate a workflow that combines zero-shot detection, segmentation, and inpainting. By following the guide, users can learn to transform and manipulate images solely through text commands, removing the need for manual manipulation with traditional software. The tutorial demonstrates how to leverage these models for various creative applications, including rapid prototyping, image translation, video editing, and object identification and replacement, highlighting the accessibility and precision of text-based image editing.