| HN Points | HN Title (Links to original post) | Submitted Date |
|---|---|---|
| 1 | SurgeAI Blog: Human Evals vs. Academic Benchmarks | 2025-09-04 |
| 1 | Unsexy AI Failures: The PDF That Broke ChatGPT | 2025-09-03 |
| 22 | SWE-Bench Failures: When Coding Agents Spiral into 693 Lines of Hallucinations | 2025-09-18 |
| 2 | Unsexy AI Failures: Still Confidently Hallucinating Image Text | 2025-09-23 |
| 4 | Unsexy AI Failures: The PDF That Broke ChatGPT | 2025-10-03 |