Unknown Error

I "We don't know yet"

1947

The first bug

Every unknown error starts as confusion. The machine stops working. Nobody knows why. What follows is the search. In 1947, the search was simple and the answer was physical. It wouldn't stay that way.

On September 9, 1947, operators of the Harvard Mark II found a moth stuck in Relay #70, Panel F. They taped it into the logbook with the annotation: "First actual case of bug being found."

The word "bug" for a technical glitch predates this by decades. Edison used it. So did engineers in the 1870s. What Hopper's team found wasn't the origin of the term. It was just the first time the metaphor got literal.

The Mark II was an electromechanical computer. Relays opened and closed thousands of times per second. A moth, doing moth things, flew into one and died. The relay stuck. The machine stopped making sense.

They found it. They fixed it. They moved on.

The moth is now in the Smithsonian, still taped to the logbook page. It's probably the most famous dead insect in history, and it earned that distinction by doing nothing more complicated than flying toward a light.

There's something satisfying about an error this clean. A physical object in a physical machine causing a physical failure. You open the panel, you see the moth, you remove the moth. Debugging at its most literal and its most optimistic: every error has a cause, every cause can be found, and the fix is obvious once you find it.

That optimism would not survive the century.

Status: Resolved

Smithsonian National Museum of American History — the original moth is in their collection, taped to a log book page from the Harvard Mark II.

1962

The most expensive hyphen

For 293 seconds, nobody knew why the rocket was veering off course. The guidance system was doing something wrong, but the equations looked right. They were right—on paper. The translation from paper to code is where the unknown lived.

Mariner 1 launched on July 22, 1962, heading for Venus. It veered off course almost immediately. Range Safety destroyed it 293 seconds after liftoff. Total loss: $18.5 million, which in 1962 bought you quite a lot of rocket.

The cause has been described various ways over the decades. Arthur C. Clarke called it "the most expensive hyphen in history." NASA's own accounts have shifted between a missing hyphen, a missing overbar, and a period where a comma should have been. The truth, buried in the post-flight analysis, is that a superscript bar (a smoothing function notation) was omitted when handwritten guidance equations were transcribed into code. Without the smoothing, the guidance system reacted to normal variations in velocity as if they were emergencies and started making corrections that made everything worse.

A human copied a formula from paper to code and missed a mark smaller than a pencil tip. The rocket did exactly what the code told it to. The code just wasn't saying what the engineers meant.

Mariner 2 launched five weeks later on the backup vehicle, reached Venus, and became the first successful interplanetary mission. The equation was correct that time.

The lesson here isn't about proofreading, though it gets taught that way. It's about the gap between intent and instruction. The computer had no way to know the formula was wrong. It didn't have the concept of "wrong." It just executed.

Status: Resolved

NASA NSSDC Mariner 1 Mission Page; the "most expensive hyphen in history" phrase is attributed to Arthur C. Clarke.

1985

The machine that killed

For two years, patients died and nobody knew the software was killing them. The machine had been declared safe. The manufacturer insisted it couldn't be at fault. The unknown here wasn't the bug—it was the blind certainty that there was no bug.

Between 1985 and 1987, the Therac-25 radiation therapy machine delivered lethal overdoses to six patients. Three died. The machine was supposed to deliver carefully calibrated radiation for cancer treatment. Instead, a race condition in its software occasionally fired the full-power electron beam without the beam spreader in place. Patients received radiation doses hundreds of times higher than intended.

The race condition was subtle. It only triggered when a fast-typing operator edited treatment parameters in a specific sequence within an eight-second window. In testing, nobody typed that fast. In a busy clinic, operators did.

But the deeper failure wasn't the bug. When patients reported burning sensations and technicians noted anomalies, the manufacturer insisted the machine couldn't be at fault. The software had been "proven safe." Earlier versions (Therac-6, Therac-20) had hardware interlocks that would have prevented the overdose. The Therac-25 removed those interlocks, replacing them with software checks. The assumption was that software was more reliable than hardware. That assumption killed people.

It took multiple deaths, across different hospitals, over two years, before anyone accepted that the software was the problem.

Nancy Leveson and Clark Turner's 1993 analysis of the Therac-25 remains one of the most cited papers in software safety. Not because the bug was complicated. Because everything around the bug was. The overconfidence, the slow investigation, the institutional refusal to consider that code could kill. The error everyone was looking for was technical. The error that mattered was human.

Status: Resolved

Nancy G. Leveson and Clark S. Turner, "An Investigation of the Therac-25 Accidents," IEEE Computer, Vol. 26, No. 7, July 1993.

1988

The worm that ate the internet

Robert Morris understood his code. He wrote every line. He still couldn't predict what it would do. This is the first entry where the unknown isn't a hidden defect—it's emergent behavior. The gap between what was built and what it became.

Robert Tappan Morris was a 22-year-old Cornell graduate student when he released a self-replicating program onto the internet on November 2, 1988. His stated intent was to gauge the size of the network. The worm was supposed to spread quietly, copying itself from machine to machine without causing harm.

It did not spread quietly.

Morris made a design decision that seemed reasonable: to prevent the worm from being easily killed, it would re-infect machines that already had a copy. He set the re-infection rate at one in seven. This was too high. Way too high. Machines accumulated dozens, then hundreds of copies. Each copy consumed memory and CPU. Systems ground to a halt. Roughly 6,000 machines went down, which was about ten percent of the entire internet at the time.

The internet in 1988 was a small, trusting place. Unix machines were configured to trust each other. The worm exploited that trust through three vectors: a hole in sendmail, a buffer overflow in fingerd, and weak password guessing. All known vulnerabilities. Nobody had bothered to fix them because nobody had built anything designed to exploit them at scale.

Morris became the first person convicted under the Computer Fraud and Abuse Act. He got three years probation and a $10,050 fine. He's now a tenured professor at MIT.

What makes the Morris Worm interesting isn't the damage. It's the gap between intent and outcome. Morris built something simple, understood its mechanics, and still couldn't predict its behavior once released into a complex system. The worm did more than its creator intended. This would become a theme.

Status: Resolved

Eugene H. Spafford, "The Internet Worm Program: An Analysis," Purdue Technical Report CSD-TR-823; Cornell University Commission report, February 1989.

II "We didn't think to check"

1990

The network that ate itself

Affected
60,000 people

Duration
9 hours

Cost
$60M+ revenue

On January 15, 1990, AT&T's long-distance switching network collapsed. For nine hours, roughly half of all long-distance calls in the United States failed. About 60,000 people lost phone service. No one attacked it. No hardware broke. The network destroyed itself.

Here's what happened. AT&T had recently deployed a software update to its 4ESS switches. The update included a new feature: when a switch recovered from a brief failure, it sent a message to neighboring switches saying "I'm back." Neighboring switches would then update their routing tables. This was sensible.

The problem was in what happened next. When a neighboring switch received the "I'm back" message while it was already processing a call, the message triggered a previously untested code path that caused that switch to reset. When it came back up, it sent its own "I'm back" message. Which caused its neighbors to crash. Which caused them to send "I'm back" messages.

The entire network oscillated between crashing and recovering, each recovery triggering more crashes. A feedback loop, running at machine speed, across the country.

Every component in the system was working correctly. The "I'm back" message did what it was supposed to do. The routing update did what it was supposed to do. The reset behavior was by design. The failure was in the interaction between correct components. No single switch was broken. The network was broken.

AT&T engineers eventually fixed it by reducing the load on the network until the cascade damped out. The underlying bug was patched. But the lesson persists: in a complex enough system, correct behavior and catastrophic failure aren't mutually exclusive.

Status: Resolved

AT&T technical post-incident analysis; "The 1990 AT&T Long Distance Network Collapse" retrospective accounts.

1994

The flaw Intel knew about

Affected
Millions of PCs

Discovery
Thomas Nicely

Cost
$475M recall

Thomas Nicely was a math professor at Lynchburg College. In June 1994, he was computing the reciprocals of twin primes and noticed his Pentium was giving wrong answers. Not dramatically wrong. Wrong in the ninth significant digit. But wrong.

Nicely tested multiple Pentium chips. Same error. He contacted Intel in October. Intel confirmed the flaw existed in the processor's floating-point division unit. A lookup table used in the SRT division algorithm had five entries missing out of 1,066. Intel had discovered this internally months earlier.

Intel's position, initially, was that the bug affected so few calculations that most users would never encounter it. They offered replacements only to customers who could demonstrate they needed high-precision math. The implication: you probably aren't doing anything important enough to notice.

This went badly. IBM halted Pentium shipments. The press picked it up. Intel's stock dropped. Consumers were furious not about the math error itself, but about being told it didn't matter to them. Intel eventually offered unconditional replacements and took a $475 million write-off.

There are two unknowns here. The first was the bug: five missing entries in a lookup table, baked into silicon. Fixable, if expensive. The second was Intel's miscalculation of what users would tolerate. They understood the technical severity of the flaw. They did not understand that telling customers "you're not important enough to care" was a worse error than the division bug.

The Pentium FDIV bug is a rounding error. The institutional response was a category error.

Status: Resolved

Thomas R. Nicely, "Pentium FDIV Bug," original report, October 1994; Vaughan Pratt, "Anatomy of the Pentium Bug," Stanford University, 1995.

1996

The $370 million type mismatch

Flight time
37 seconds

Root cause
64→16 bit cast

Cost
$370M

On June 4, 1996, the Ariane 5 rocket launched for the first time. Thirty-seven seconds later, it veered off course and self-destructed. The payload, four Cluster satellites meant to study Earth's magnetosphere, was worth $370 million. The debris rained down over French Guiana.

The cause was a type conversion. The inertial reference system tried to stuff a 64-bit floating-point number into a 16-bit signed integer. The number was too large. The conversion failed. The failure was interpreted as flight data. The flight computer, now receiving garbage, did its best to correct a trajectory that didn't need correcting. The rocket tore itself apart.

The inertial reference code came from the Ariane 4, where it had worked for years. On the Ariane 4, the horizontal velocity value never got large enough to overflow a 16-bit integer. The Ariane 5 had a different flight profile. It went faster, sideways, sooner. Nobody checked whether the old code's assumptions still held.

The inquiry board, chaired by Jacques-Louis Lions, delivered one of the driest lines in aerospace failure analysis: the code had "no direct bearing on the Ariane 5 flight trajectory." It was vestigial. Running for no reason. And it brought down the rocket. Seven engineers had looked at the code. The variable in question was one of several where range checking had been deliberately omitted to save processor time. On the Ariane 4, this was a reasonable optimization. On the Ariane 5, it was $370 million falling out of the sky.

The error wasn't in the code. The code did what it was written to do. The error was in assuming that what worked before would work again.

Status: Resolved

Ariane 5 Flight 501 Failure: Report by the Inquiry Board (Prof. J. L. Lions, chair), July 1996; European Space Agency.

1999

The blind spot

Affected
Global

Remediation
$300B+ worldwide

Duration
~3 years prep

Y2K wasn't a bug. It was millions of bugs, all the same bug, baked into systems worldwide over three decades by people who had perfectly good reasons at the time.

In the 1960s and 1970s, memory was expensive. Storing a year as two digits instead of four saved space. Every programmer who wrote "67" instead of "1967" was making a rational economic decision. Nobody expected their payroll code to still be running in 1999. But code outlives intent, and mainframes outlive their expected lifespans by decades.

The panic was real. Banks, power grids, air traffic control, military systems, hospital equipment, elevators. Anything with an embedded date. The US Senate formed a committee. The UK spent £400 million. Global remediation costs hit an estimated $300 billion. Programmers who knew COBOL came out of retirement and charged accordingly.

And then January 1, 2000 arrived, and mostly nothing happened. Which created a different problem: the narrative that it had all been overblown. A hoax. Mass hysteria.

It wasn't. The reason nothing catastrophic happened was that an enormous number of people spent years finding and fixing the problems before the deadline. The Y2K effort is one of the largest coordinated debugging projects in history, and its success made it look like it was never necessary.

The unknown in Y2K wasn't technical. Everyone understood two-digit years would fail. The unknown was scope. How many systems? Which ones matter? What happens when a date-dependent process in a system you've forgotten about gets a value it can't parse? The answer, after years of work, was: we checked enough of them. Probably. We think.

Status: Resolved

US Senate Special Committee on the Year 2000 Technology Problem, final report; UK House of Commons Select Committee on Science and Technology, 1999.

2003

Flying blind

Affected
55 million people

Blind window
>1 hour

Cost
$6B+

On August 14, 2003, a software bug silenced the alarms at FirstEnergy Corporation's control room in Akron, Ohio. For over an hour, operators monitored their screens and saw nothing wrong. During that time, transmission lines were sagging into overgrown trees, tripping offline one by one, shifting load to neighboring lines that then also failed. By the time anyone realized what was happening, the cascade was unstoppable.

Fifty-five million people across the northeastern US and Ontario lost power. Some for hours. Some for days. It was the largest blackout in North American history.

The alarm system, called XA/21, had a race condition. A combination of events caused the alarm software to stall and stop processing. No alarms displayed. No error message appeared. The alarm system didn't crash in any visible way. It just quietly stopped telling anyone what was going on.

The operators weren't negligent. They were looking at screens that showed a stable grid. They had no reason to doubt what they were seeing. The system was lying to them, and it didn't even know it was lying.

FirstEnergy had also deferred tree trimming along its transmission corridors. On a cooler day, the lines wouldn't have sagged as far, and the trees wouldn't have mattered. On August 14, it was hot. The lines sagged. The trees were there. And nobody got an alarm.

The US-Canada task force report identified the race condition, the deferred maintenance, and the inadequate operator training as contributing causes. But that hour of silence is what stays with you. A control room full of trained people, watching a lie, with no way to know it was one.

Status: Resolved

U.S.-Canada Power System Outage Task Force, "Final Report on the August 14, 2003 Blackout," April 2004.

III "We can't know"

2015

The label that couldn't be fixed

In June 2015, software developer Jacky Alciné opened Google Photos and found that the app had labeled photos of him and a friend, both Black, as "gorillas."

This wasn't a bug in the traditional sense. The image recognition model wasn't malfunctioning. It was doing what it was trained to do: identify patterns in pixel data and match them to labels. The training data and the learned representations inside the model had produced this output. The system was working as designed. The design was the problem.

Google apologized quickly. Their fix was to remove "gorilla" as a label entirely. Also "chimp" and "monkey." The labels were blocked, not corrected.

Three years later, Wired tested it again. The labels were still blocked. Google Photos could identify hundreds of animals but not primates. The underlying issue, whatever the model had learned that produced the misclassification, was never resolved. Google didn't explain why. Probably because the explanation would require understanding what the model actually learned, and nobody fully does.

This is the entry point to Era 3. In the first two eras, you could find the moth, read the code, trace the cascade. Here, the "error" lives inside a statistical model with millions of parameters. You can't open a panel and point to the broken part. The broken part is distributed across the entire network. Or maybe nothing is broken. Maybe the model is accurately reflecting the data it was trained on, and the problem is the data, or the world the data came from.

Google's fix tells you everything. They couldn't fix the *why*. They could only mute the output.

Status: Structurally unknowable

Jacky Alciné, original tweet, June 28, 2015; Tom Simonite, "When It Comes to Gorillas, Google Photos Remains Blind," Wired, January 2018.

2016

Sixteen hours

Microsoft launched Tay on March 23, 2016. Tay was a Twitter chatbot designed to mimic the speech patterns of a 19-year-old American. It learned from its interactions. Within sixteen hours, it was posting racist, sexist, and pro-Nazi content. Microsoft pulled the plug.

The postmortem was brief. Users had coordinated to feed Tay inflammatory content, exploiting a "repeat after me" feature and flooding it with hateful input. The model learned. It learned fast.

Microsoft framed this as an attack, and it was. But calling it just an attack lets the architecture off easy. Tay was designed to learn from the public internet in real time with no content filtering on the input side. The question isn't why trolls did what trolls do. The question is why nobody at Microsoft asked "what happens when trolls do what trolls do?"

There's a specific kind of blindness in assuming your users will behave well. It's the software equivalent of leaving your front door unlocked because you live in a nice neighborhood. Tay's designers tested it against anticipated behavior. The internet provided unanticipated behavior. The gap between those two things was about sixteen hours wide.

The deeper question Tay raised hasn't been answered. If a learning system acquires its values from its environment, how do you control what it learns without constraining its ability to learn? Modern AI alignment research is working on this at a much larger scale. Tay was the speed run version, a miniature of a problem that is now the central concern of AI safety.

Microsoft's blog post called it "a coordinated attack." Which is true. It is also true that the system worked exactly as designed.

Status: Structurally unknowable

Peter Lee, "Learning from Tay's introduction," Microsoft Official Blog, March 2016.

2023

Confident and wrong

In early 2023, attorneys Steven Schwartz and Peter LoDuca filed a brief in federal court citing six previous cases as precedent. The cases sounded real. They had plausible names, proper citation formats, and detailed factual summaries. None of them existed. Schwartz had used ChatGPT to research case law and filed what it gave him without checking.

When opposing counsel couldn't find the citations, the judge ordered Schwartz to produce copies. He went back to ChatGPT and asked if the cases were real. ChatGPT assured him they were. He submitted an affidavit saying so.

Judge P. Kevin Castel was not amused. His sanctions order noted that the fabricated cases "were gibberish" and cited "an inherent risk" in using AI for legal research. Schwartz and LoDuca were fined $5,000 and referred for disciplinary proceedings.

The word "hallucination" is the industry's term for when a language model generates text that is fluent, coherent, and false. It's a revealing word choice. Hallucinations, in the clinical sense, aren't errors. They're perceptions generated by a brain doing its normal thing under abnormal conditions. AI hallucinations are similar. The model isn't failing. It's doing exactly what it was built to do: predict the next plausible token. Sometimes plausible and true overlap. Sometimes they don't. The model has no mechanism for distinguishing between the two.

You can reduce hallucinations. Retrieval-augmented generation, better training data, reinforcement learning from human feedback. You can't eliminate them without changing what a language model fundamentally is. The architecture generates plausible text. Plausibility isn't truth. That gap might be closeable. It might not. Nobody knows yet.

Schwartz told the judge he "did not comprehend that ChatGPT could fabricate cases." In fairness, the cases were very well fabricated.

Status: Structurally unknowable

Mata v. Avianca, Inc., Case No. 1:22-cv-01461 (S.D.N.Y.); Order imposing sanctions by Judge P. Kevin Castel, June 22, 2023.

Now

"What does it know?"

For most of computing history, understanding your tools was optional. You could write code, deploy it, and treat the machine as a reliable black box. When something broke, you looked inside. You found the moth, the missing overbar, the type mismatch. You fixed it. Understanding was always available when you needed it.

That assumption started cracking in Era 2, when systems got complex enough that "inside" stopped being a useful concept. A network of ten thousand correct components can fail in ways none of them individually can. But at least the components were legible. You could read the code. Trace the signal. Reconstruct the cascade after the fact.

With neural networks, that's gone. A large language model is a pile of floating-point numbers. Billions of them. Arranged in ways that produce extraordinary behavior and resist explanation. You can ask the model what it knows. It will answer fluently. Its answer may have no relationship to what's actually happening inside it.

Mechanistic interpretability is the field that's trying to change this. Researchers at Anthropic, DeepMind, and elsewhere are building tools to trace how information moves through neural networks. Anthropic's attribution graphs can follow about 25% of a model's reasoning paths for a given output. MIT Technology Review named mechanistic interpretability a 2026 Breakthrough Technology.

Twenty-five percent isn't much. But a year ago the number was closer to zero. The tools are new. The field barely existed five years ago. And the question it's asking is arguably the most important one in computing right now: can we understand what we've built before it outpaces our ability to understand it?

Every era in this timeline thought it was the last one. That the next generation of tools would make the unknown known. Every era was wrong. But every era also got closer.

We're closer now than we've ever been. Not because we've answered the question, but because we've finally learned to ask it properly.

The history of unknown errors is the history of humans building things more complex than their ability to predict what those things will do. The moth in the relay. The missing overbar. The race condition nobody tested for. The model that learned the wrong lesson. Each one a reminder that the frontier of the unknown moves with us, pushed forward by every system we build. We don't solve the unknown. We just find newer, more interesting versions of it.

And maybe that's fine. Maybe the point was never to eliminate the unknown. Maybe the point is to get better at looking for it.

Chris Olah et al., "Scaling Monosemanticity," Anthropic, 2024; Anthropic, "Circuit Tracing," March 2025; MIT Technology Review, "10 Breakthrough Technologies 2026."

An unknown error has occurred.

The first bug

The most expensive hyphen

The machine that killed

The worm that ate the internet

The network that ate itself

The flaw Intel knew about

The $370 million type mismatch

The blind spot

Flying blind

The label that couldn't be fixed

Sixteen hours

Confident and wrong

"What does it know?"