Inside the meltdown: One thing CrowdStrike and Microsoft can’t fix | Tech Reader

Date:

Share:

[ad_1]

I only saw one Blue Screen of Death on Sunday, July 21, across 15 hours of travel via two of the country’s biggest airports, just two days after a botched software update crippled millions of corporate computers running the Windows operating system.

“Maybe things are OK,” I remember thinking as my family took the first steps into New York’s LaGuardia Airport around 9 a.m. Headlines to the contrary on day 3 of the Great Windows Outage of 2024, the ticketing and baggage area didn’t look too bad.

I should have known better. I’d taken literally two steps inside the building before getting the first of about 3,000 delay emails from Delta over the course of the day, to go along with even more notifications from the Flighty and Fly Delta apps. This wasn’t going to be an easy run home from New York to Florida, something I’ve done dozens of times over the years.

A notification from the Flighty app on an Apple Watch.
The usually excellent Flighty App simply wasn’t designed to keep up with so many airframe swaps — these notifications came in multiple times an hour. Phil Nickinson / Tech Reader

I’m no stranger to flight delays. (I spent 15 hours in the Sky Club at LAX in late January — not something I recommend, despite how good it is.) But this one was different. Weather happens. Mechanical issues happen. They suck, but those all come down to safety. This time? A third-party security vendor botched a file inside of Windows. CrowdStrike should have caught it. Microsoft should have caught it. Neither did until it was too late. While the fix was relatively simple — boot into Safe Mode, or keep restarting the machine until the bad file was replaced — the first-order effects were immense.

It’s the second- and third-order effects where things really went wrong for the airlines. Delta was hit particularly hard — CEO Ed Bastian on Sunday wrote that more than 3,500 flights were canceled through Saturday, and many more on Sunday. “Please come see me at the podium if you need a hug,” our gate agent said around 4:30 p.m. on Sunday as the board refreshed to read CANCELED.

The scene from Gate A7 at Atlanta Hartsfield-Jackson International Airport late in the evening of July 21, 2024.
For many of us at Atlanta’s Hartsfield-Jackson International Airport, there was nothing to do but wait, and hope that the next flight wouldn’t be canceled. Phil Nickinson / Tech Reader

The line for the rebooking desk in the A concourse at Atlanta — one of seven terminals in the country’s busiest airport — was comically (or tragically) long. I sat with one earbud in, on hold with the airline reservation’s line for two hours before giving up. (My brother, who has much higher frequent flier status, at least managed to get a real person to tell him that there was no way I was getting out before midnight, and that the best thing to do was to stick to the assigned flight for now.)

Finally onboard in the early hours of Tuesday, July 22, a flight attendant gave us an idea of what was really throwing a wrench into things: Delta didn’t know where its crews were. That was confirmed later in the day in another news post from Delta, which said that more than half of its IT system runs Windows, and that additional sync time was required even after the affected machines were rebooted.

“Delta’s crews are fully staffed and ready to serve our customers,” the post continued. “But one of Delta’s most critical systems — which ensures all flights have a full crew in the right place at the right time — is deeply complex and is requiring the most time and manual support to synchronize.”

An in-flight entertainment screen on a Delta 757-200, waiting to leave Atlanta.
It was past midnight, but those of us who managed onto a Boeing 757-200 were plenty excited about it. Phil Nickinson / Tech Reader

We ultimately made it home at nearly 2 a.m. Tired. A little frazzled. But only about eight hours late, all told. We were fortunate. My brother spent some 30 hours in the Atlanta airport two days earlier, just trying to get home to Pensacola after aborting a trip to the West Coast. No flights. No one-way car rentals. Save for waiting, no other real options beyond someone driving five hours each way for a rescue.

Our stories were just two of thousands — and ours were relatively low-stakes. We didn’t have any kids traveling on their own. We weren’t out a ton of money, beyond a couple of meals we didn’t plan on having in an airport. Our bags made it on the same plane.

The immediate fix for the CrowdStrike failure was pretty simple. CrowdStrike and Microsoft need to have policies in place to mitigate the possibility of this happening again. (It will, of course, happen again.) But as the saying goes — and this is the PG-13 version — poop flows downhill. None of this was the airlines’ fault. But it quickly became their mess to clean up.

And that’s something a simple reboot can’t fix. Even if you do it more than 8 million times.






[ad_2]

Source link

━ more like this

Sends shares Q1 2026 business update and product progress

Sends reported Q1 2026 updates sharing news on digital cards, app redesign, ClearBank integration, and fintech industry recognition. Sends, a fintech platform operated by Smartflow...

We swipe our phones all day, and scientists just ranked which ones are the most tiring

We all know staring at your phone for hours isn’t great for mental health. But what about your fingers? Previously, researchers couldn’t measure...

Two suspects have been arrested for allegedly shooting at Sam Altman’s house

OpenAI CEO Sam Altman's house may have been the target of a second attack after San Francisco Police Department arrested two suspects for...

You Can Soon Buy a $4,370 Humanoid Robot on AliExpress

Listing consumer electronics on the internet's large ecommerce marketplaces is a key step in “democratizing” the products, allowing them to be purchased by...
spot_img