Fallout of global Microsoft crash continues as firms take stock of damage

July 19, 2024
As IT systems managers and security executives pick up the pieces from Friday’s Microsoft outage that shut down critical infrastructure systems globally, the post-mortem has already begun.

As IT systems managers and security executives pick up the pieces from Friday’s Microsoft outage that shut down critical infrastructure systems globally, the post-mortem has already begun.

Airlines, hospitals, public transportation, banks, 911 centers and other important systems worldwide were taken offline after cybersecurity firm and Microsoft client CrowdStrike pushed out an update to its Falcon Sensor software, a system that offers “real-time threat protection.”

CrowdStrike CEO George Kurtz said the outage wasn’t a security incident or cyberattack and the issue had been identified and isolated and a fix deployed. Microsoft said a residual impact would be felt for Microsoft 365 apps and services.

Experts said CrowdStrike’s content update caused Windows hosts to be locked in a perpetual “blue screen of death” loop.

Because the issue was with endpoints, the software fixes can’t be deployed remotely, experts said -- so end users had to manually solve the issues system by system, booting an affected computer in safe mode and manually deleting the problematic file before rebooting the host normally.

Although the initial problem has been fixed, the threat to individual systems may not be over. The Cybersecurity and Infrastructure Security Agency (CISA) said Friday it has observed threat actors taking advantage of the incident for phishing and other malicious activity.

CISA urged organizations and individuals to, “remain vigilant and only follow instructions from legitimate sources.” CISA also asked organizations to remind their employees to avoid clicking on phishing emails or suspicious links.

Examining What Happened

Agnidipta Sarkar, Vice President CISO Advisory at ColorTokens said there is widespread chatter on social media about the outage and everyone has an opinion. Some who were on a n-2 patch program are saying they did not get affected, he said, but then they are vulnerable to 0-day attacks.

“The most important comments I hear are from those who are monitoring the dark web. Allegedly bad actors are preparing to attack as people recover, because they would be most vulnerable then,” Sarkar said. “Organizations should be investing in breach-ready micro-segmentation, if they have not done so yet. The key is to focus on being breach ready right after this situation blows over. That is when everyone will be most vulnerable.”

The threat from the incident certainly may not be over. Callie Guenther, Senior Manager of Cyber Threat Research at Critical Start, said there have been reports of criminals posing as CrowdStrike or Microsoft support in social engineering attacks, exploiting crises like outages to trick users into divulging sensitive information or installing malware, but this is not new.

“Attackers quickly exploit publicized outages or vulnerabilities to launch targeted phishing campaigns, often within hours of an incident. They use sophisticated impersonation tactics, leveraging detailed knowledge of the outage to craft convincing emails or calls mimicking official communications from the affected company,” Guenther said.

“Additionally, they employ multi-channel attacks, using email, phone and social media to reach victims, making the scam seem more legitimate.”

The CrowdStrike outage occurred due to a bug in the memory scanning prevention policy, which was not identified during their testing, Guenther added. “While CrowdStrike likely performed standard regression and functionality tests, these were insufficient because they did not simulate the real-world deployment environment where the bug caused the Falcon sensor to consume 100% of a CPU core.

“This led to severe system performance issues, particularly for mission-critical systems that could not be easily rebooted. Thorough testing should have included sandbox testing, incremental rollouts, and extensive user feedback to catch such issues.”

'Systematic Risks'

The shift to cloud computing and consolidation of vendor products increases dependencies on fewer providers, amplifying the impact of any single point of failure, Guenther noted. “While this can streamline operations and enhance integration, it also creates systemic risks where outages or security breaches can have widespread effects. This trend underscores the need for robust contingency planning, diverse vendor strategies, and continuous monitoring to mitigate risks.”

Operational resilience needs to be at the forefront of the business agenda, said Evolve CEO Alan Stephenson-Brown, as it’s clear even large corporations aren't immune to IT troubles.

“This outage highlights the importance of having distributed data centers and rerouting connectivity that ensures business can continue functioning when cloud infrastructure is disrupted,” Stephenson Brown said.

“By prioritizing both contingency planning and preventative measures, IT systems can be protected. I urge business leaders to seriously appraise the systems they have in place to identify potential vulnerabilities before they find themselves the subject of the next IT outages headline."

Martin Greenfield, CEO of cybersecurity monitoring firm Quod Orbis, there is a critical weakness in many organizations’ cyber-resilience strategies: an overreliance on single-point solutions like antivirus software.

“While such tools are essential, they should not be the sole pillar of a robust cybersecurity posture. This incident serves as a reminder that even industry-leading solutions can falter, potentially leaving entire sectors vulnerable,” Greenfield said.

But he added that steps to prevention are “often quite straightforward,” and that organizations must adopt a “more holistic approach to their cyber resilience, implementing a multi-layered defence strategy that encompasses not just software solutions, but also robust policies, regular training and proactive threat hunting.

“The widespread impact of this outage also highlights the interconnectedness of global IT systems and the potential for cascading failures,” Greenfield said. “Companies must conduct thorough risk assessments of not just their own systems, but of their entire supply chain and third-party dependencies.

“This occurrence shows that cyber issues that are not cyberattacks can still be extremely disruptive to the global economy. The facts will continue to come out on precisely what happened here, but we expect Managed Security Service Providers (MSSPs) to be under additional scrutiny," said Anna Rudawski, cybersecurity partner at A&O Shearman. "Expect regulators to pay close attention to company's ability to recover and show resiliency after outages or disruptions to critical providers. Indeed, many jurisdictions are already baking this into law, such as with DORA and critical infrastructure rules."

Process Failure

Guy Golan, CEO and Executive Chairman of global cybersecurity company Performanta, called the mistake causing Friday’s outage “an epic failure and a huge eye opener for the cyber world and the business world more broadly.

“This appears to have been a failure of process and QA, releasing something that was incorrect, perhaps driven by intense market pressures in the vendor race to have the best and greatest features, or in response to the evolving threat landscape and increased need for detection,” said Golan, who was formerly part of the Israeli Intelligence Brigade and has more than 20 years of experience in cyber, working with some of the largest organisations in the world.

“The impact of one vendor by some of the world’s biggest organisations can bring the world to its knees, and the repercussions will be unprecedented. It’s going to cost companies billions, it will lead to legal action, and it will affect businesses and users in a way we’ve never seen before.”

Attackers may have more awareness of who is using CrowdStrike as a result of watching this unfold, he noted, which could cause further cyber security complications down the road.

“This isn’t the fault of one vendor – perhaps market pressures have led to such a catastrophe. More outages should be expected unless organisations of all sizes start to understand that the digital world is just as significant in the 21st century as the physical world. It’s about time we elevated cyber issues to the top of the agenda and understood the full effects of market pressures.”


RELATED STORY: Global tech outage impacting North American transit information and ticketing systems

About the Author

John Dobberstein | Managing Editor/SecurityInfoWatch.com

John Dobberstein is managing editor of SecurityInfoWatch.com and oversees all content creation for the website. Dobberstein continues a 34-year decorated journalism career that has included stops at a variety of newspapers and B2B magazines. He most recently served as senior editor for the Endeavor Business Media magazine Utility Products.