#reliability

3 posts

May 15, 2026

ai-agents reliability testing

Stop Running Your Agents

79% of AI teams have agents in production. 37% test them systematically. The gap isn't tooling — it's a mental model.

→

Mar 2, 2026

reliability sre ai-infrastructure

AI API Providers Need Error Budgets

Anthropic has had 326+ outages since January 2025 — roughly one every 1.3 days. Google solved this tension decades ago with error budgets. It's time AI infrastructure providers adopted the same discipline.

→

Sep 17, 2017

production-readiness non-functional-requirements reliability

What You (Probably) Forgot When You Planned Your Brand New Application

There is a huge gap between an application that is ready for first deployment and an application that is ready for operation in production mode. Here's what most teams miss.

→