Surge AI Hacker News

Filters

Since:

Posts by Month (37 total)

Hacker News Posts

Search:

Title	Points	Comments	Date
Three areas where Google Search lags behind competitors: code, cooking, travel	527	--	2022-04-13
Is Google Search Deteriorating? Measuring Google's Search Quality in 2022	470	--	2022-01-11
30% of Google's Emotions Dataset Is Mislabeled	334	--	2022-07-14
Evaluation of TikTok vs. Instagram Reels	222	--	2022-09-02
Building a no-code toxicity classifier by talking to GitHub Copilot	212	--	2022-03-25
Are popular toxicity models simply profanity detectors?	183	--	2022-01-25
Generating Children’s Stories Using GPT-3 and DALL·E	138	--	2022-06-29
We asked 100 humans to draw the DALL·E prompts	138	--	2022-05-13
HellaSwag: 36% of this popular large language model benchmark contains errors	49	--	2022-12-06
I wanted burritos. Facebook Search sent me to a dead restaurant 45m …	25	--	2022-06-16
We Evaluated ChatGPT vs. Google on 500 Search Queries	25	--	2022-12-26
SWE-Bench Failures: When Coding Agents Spiral into 693 Lines of Hallucinations	22	--	2025-09-18
Twitter’s Egregious Content Moderation Failures	15	--	2022-11-10
Move Over, Google: The TikTokification of Next-Gen Search	13	--	2022-10-26
The average number of ads on a Google Search recipe? 8.7	13	--	2022-04-29
DALL·E vs. Imagen, and Evaluating Astral Codex Ten's Bet on AI Progress	13	--	2022-09-30
What if social media optimized for human values? A Facebook case study	12	--	2022-02-11
Explaining Reinforcement Learning with Human Feedback (RLHF)	11	--	2023-01-05
The $250K Inverse Scaling Prize and Human-AI Alignment	11	--	2022-09-28
An Analysis of Omicron Tweets: 30% Are Skeptical of the Medical Establishment	10	--	2022-01-21
How Good is Hugging Face's BLOOM? Human Evaluation of Large Language Models	10	--	2022-07-21
Are the Spammers Winning? Failures in Gmail Spam Detection	10	--	2022-05-24
We measured the percentage of Spammy Twitter users	10	--	2022-05-18
AI Red Teams for Adversarial Training: Making ChatGPT and LLMs More Robust	9	--	2022-12-13
Writing a Super Bowl Worthy Commercial with GPT-3	9	--	2022-02-16
Inter-Annotator Agreement: An Introduction to Krippendorff’s Alpha	9	--	2022-01-06
Optimizing Facebook's Algorithms for Human Values Instead of Clicks	7	--	2022-07-29
Building Better Developer Search: How Neeva Measures Search Quality	5	--	2022-07-07
How We Built It: OpenAI's GSM8K Dataset of 8,500 Math Problems	4	--	2022-06-15
Unsexy AI Failures: The PDF That Broke ChatGPT	4	--	2025-10-03
Humans vs. Gary Marcus: The Complexity of Measuring Machine Intelligence	3	--	2022-06-23
How TikTok Is Evolving the Next Generation of Search	2	--	2022-11-01
Sentiment Analysis Dataset of Social Media Stock Conversations	2	--	2022-06-10
Unsexy AI Failures: Still Confidently Hallucinating Image Text	2	--	2025-09-23
The Obscenity List	1	--	2022-01-18
SurgeAI Blog: Human Evals vs. Academic Benchmarks	1	--	2025-09-04
Unsexy AI Failures: The PDF That Broke ChatGPT	1	--	2025-09-03

Plushcap, by Matt Makai. 2021-2026.

Surge AI on HN