How NASA Built Artemis II’s Fault-Tolerant Computer

The computer aboard Orion for Artemis II is a far cry from Apollo’s 1‑MHz marvel. It runs most of the capsule’s safety-critical functions now — life support, thrusters, comms — and it must do so 250,000 miles from the nearest technician. That raises the hair-on-your-neck question: what happens when space decides to flip a bit? The answer is redundancy piled on redundancy, and an engineering promise: if a processor lies, it stays quiet.
The Power of Eight
NASA moved beyond classic triple-voting. Orion runs eight CPUs in parallel: two Vehicle Management Computers, each with two Flight Control Modules, and each FCM made of a self‑checking processor pair. The design is “fail‑silent.” A processor that detects an internal error shuts up rather than spew a wrong answer. Instead of triplex voting, the software uses a priority-ordered source selection that picks the first healthy FCM; if one goes silent, the next takes over. Engineers say the system can lose three FCMs in 22 seconds and still ride through — a terrifyingly precise margin of safety when you’re inside the Van Allen Belts. No runway, no pit stop. No drama allowed.
Enforcing determinism
Running the same flight code eight ways in lockstep is hard. Tiny timing drifts, variance in caches, or a cosmic-ray glitch can make two processors diverge, and divergence is what you’re trying to avoid. So NASA enforces determinism across hardware and software: tightly controlled execution scheduling, self‑checking processors, and mechanisms to re‑synchronize a reset module back into the group mid‑flight. It’s a distributed system problem writ large — except the users are astronauts and the cost of getting it wrong is existential.
Testing and formal methods
It has been reported that engineers leaned heavily on exhaustive testing and formal methods to prove parts of the system’s behavior, from the fail‑silent logic to the priority selection and re‑sync protocols. Hardware selection, radiation testing, and system‑level simulation all played roles; so did rigorous software verification. The emotional core here is plain: engineers built this to be boring. Boring is good when human lives depend on a box that refuses to lie.
Sources: cacm.acm.org, Lobsters
Comments