Not a Hack: A Cloudflare Config Slip That Knocked Parts of the Internet Offline

A midday wave of 5xx errors swept across Cloudflare’s network on a recent day, briefly knocking out even its status page and prompting early suspicion of an external attack. By early afternoon, however, the company traced the disruption to an internal configuration mistake—no botnet or DDoS was to blame.

Here’s what unfolded. At around 11:05 UTC, a permissions change in a database system unintentionally set off a chain reaction. The change altered how a feature file used by Cloudflare’s bot management system was generated, causing that file to swell to nearly twice its normal size. The problem: Cloudflare services allocate a fixed amount of memory for this file. When the bloated version arrived, it exceeded that reserved memory and triggered crashes.

Around 11:30 UTC, the impact became visible as a surge of 5xx errors flooded the network. The error rate didn’t remain steady; it fluctuated sharply until about 13:00 UTC. That volatility aligns with how the system distributes updates. The bot management feature file refreshes every five minutes, and not all clusters had moved to the new configuration at the same time. As a result, different parts of the network alternated between receiving a healthy file and the oversized, crash-inducing one, producing sporadic spikes and dips in failures.

The temporary unavailability of the status page and the unusual traffic pattern initially fueled internal speculation that an attack or botnet might be responsible. By 13:37 UTC, the incident response team identified the real culprit: adjustments in the bot management system tied back to the database permissions change. Roughly an hour later, engineers rolled out fixes and stabilized performance, returning error rates to normal levels.

Key takeaways from the incident underscore the risks of configuration drift and the importance of guardrails around memory-constrained components. A single permissions tweak led to an oversized file, which then cascaded into service crashes across multiple clusters. The five-minute update cadence and staggered rollout combined to create a stop-start failure pattern that looked, at first glance, like an external assault.

In short, the outage was self-inflicted, not a cyberattack. After identifying the root cause, Cloudflare restored services and normalized network behavior, closing the incident and highlighting the value of fast detection, clear incident timelines, and rigorous change management.

Not a Hack: A Cloudflare Config Slip That Knocked Parts of the Internet Offline

Share this:

Related Posts: