From Golden Gate Bridge to Broken JSON: Why Anthropic's SAE Steering Fails for Structured Output

Post Details

Company

Hugging Face

Date Published

Feb. 7, 2026

Author

Maziyar Panahi

Word Count

5,766

Company Posts That Month

55

Language

-

Hacker News Points

-

Post removed?

No

Source URL

huggingface.co/blog/MaziyarPanahi/sae-steering-json

Summary

Maziyar Panahi's article chronicles his journey through six experiments aimed at generating valid JSON outputs from language models using activation steering, a method initially promising due to its success in altering semantic behaviors without retraining. Despite the technique's previous success in modifying semantic aspects such as safety and bias in models, it drastically failed for syntactic tasks like JSON generation, reducing the valid JSON rate from 86.8% to 24.4%. Panahi discovered that activation steering, which effectively manages semantic tasks through continuous feature manipulation, falters with binary syntactic constraints requiring discrete state management. His successful resolution involved constrained decoding using a finite state machine (FSM) to enforce JSON syntax during token generation, achieving 100% valid JSON output. This approach highlighted the importance of selecting techniques based on whether the task is semantic, which benefits from activation steering, or syntactic, which requires structural enforcement.

Trends Found in this Post

Trend	Post Mentions	Total Month Mentions	Posts	Companies	MoM
AI Model Fine-tuning	26	1,082	151	57	+103%
LLM	5	5,138	781	181	+34%
AI Guardrails	3	382	142	52	+40%

Use This Data

Use this post, company, and trend context to find content marketing opportunities, perform competitive analysis, or address product feature gaps via the Plushcap MCP server or the Plushcap API.