Reliability Lessons From the Edges at SREday NYC
Blog post from GitGuardian
SREday NYC Q2 2026 gathered around 100 site reliability engineers and technically minded individuals at the Datadog office to explore the intricacies of maintaining reliable, high-throughput systems, emphasizing the often unseen complexities that keep them operational. The event featured 20 expert speakers who addressed the challenges of modern reliability, highlighting the importance of understanding the underlying systems as much as the visible applications. Shreyas Iyer discussed the need for comprehensive world models to close the representation gap in evolving production systems, while Anisha Manoharan stressed the pitfalls of relying on average latency in fintech and the necessity of focusing on tail-end performance. Ian Miller warned against the oversimplification of "shifting left" in software development, advocating for a thoughtful redistribution of complexity with proper tools and context. Willem Pienaar highlighted the risks of agent-based systems misinterpreting signals in production environments, underscoring the value of grounding and independent verification. The overarching sentiment was a recognition that AI and rapid development have increased operational anxiety by introducing new failure modes and complexities, necessitating a renewed focus on context, evidence-based reasoning, and the continuous validation of system health from the user's perspective.