Why Fasten Seatbelts — On the Probability of Rare Events

I wanted to write about how we fixate on obstacles and crash into lone trees. But last week brought a few interesting reminders about unforeseen events that I can’t stop thinking about.

Seatbelts on a skydiving plane

Here’s an interesting fact: when you’re going up in a skydiving plane, you wear a seatbelt. From takeoff until about 450 meters altitude.

It’s incredibly uncomfortable — you’re already strapped into a parachute rig, and now you’re also buckled into the plane. And it seems utterly pointless. The belt is there in case the plane’s engine fails and you have to land. Below 450 meters, the protocol says: stay in the plane.

Every year, you review and practice the procedure for an engine failure scenario. And in the US, there hadn’t been a single engine failure on a skydiving plane in almost 10 years. So why bother?

Until last week. And here’s the thing that made it personal: it happened to the exact plane I’d jumped from the previous two years. I didn’t jump this year only because I was busy.

Statistically, the probability is tiny — roughly one in two million takeoffs. Everyone was fine, and I’m looking forward to reading the post-mortem. But the probability was never zero.

A crowd panic on the 4th of July

While walking to the pier to watch fireworks, my wife decided to tell our daughter how to behave in a crowd if something happens and people start running. It seemed like an overreaction — we’d chosen a pier away from the main crowd. But just in case.

In the main crowd, a few age-challenged individuals decided to start shouting, triggering a panic. The crowd ran. Everyone was fine, but judging by the recordings, it wasn’t pleasant.

The statistics here are worse — roughly 1 incident per 70,000-100,000 events, depending on how you count. So mass events now rank higher on my personal risk assessment than skydiving.

A Shopify outage wearing our face

On Wednesday, a client we’d been onboarding called. Their site stopped working. Nobody on their side had done anything. And we were the last ones who’d touched their system — they’d just given us access to their Shopify, and we’d started the initial data sync.

In theory, our sync shouldn’t affect anything. But first step: stop and roll back. Then investigate.

Turns out, Shopify had a DNS issue that affected some — but not all — regions and clients. It started roughly 10 minutes after we got access and began using it. Thanks to Downdetector and one user who commented about a similar problem, I had evidence that it wasn’t us. Without that, explaining to non-technical clients that a coincidence isn’t causation would have been… interesting.

By independent tracking services, Shopify had 8-10 noticeable incidents in 2025 alone. Their own status page said everything was fine. It always says everything is fine.

What these three stories teach

Practice reactions for rare events

It’s not fun. Nobody wants to think about engine failures, crowd stampedes, or coincidental outages. But 10 years without an incident doesn’t mean the probability is zero. If anything, it means the event is overdue.

The skydiving community practices emergency procedures annually — even when no one can remember the last real emergency. That’s not paranoia. That’s engineering.

Account for the fool in the system

Humans are terrible at estimating the probability of rare events. Something familiar feels safe, even when the actual probability is distributed unevenly and depends on one random person deciding to do something stupid.

The larger the crowd, the higher the probability of a random or idiotic trigger. This applies to code too: the more engineers committing to a monorepo, the higher the probability of a random broken deploy.

Don’t draw conclusions before the post-mortem

When the Shopify outage hit 10 minutes after our integration started, every instinct said: “It’s us.” It wasn’t. Correlation is not causation — but under pressure, your brain doesn’t care.

Wait for the post-mortem. Read the facts. Then draw conclusions.

And by the way — reading other people’s incident reports is incredibly valuable. You don’t have to experience every failure yourself. But more on that next time.

At Amazon, we ran game days — simulated failures in production. At Datadog, every team had runbooks for incidents they’d never seen. The discipline of preparing for rare events is what separates reliable systems from lucky ones.