Binary GCD: the ancient trick that can outpace std::gcd

The problem with Euclid
Euclid’s algorithm is elegant and fast on paper — logarithmic in the size of the inputs, with the worst case hiding in consecutive Fibonacci numbers and the golden ratio smiling in the margins. But real CPUs have gripes. The standard C++ implementations of gcd end up dominated by integer division instructions, and it has been reported that a typical run spends roughly 90% of its cycles in the idiv step. So the math is lovely; the hardware disagrees. Who would have thought the bottleneck in a millennia-old number theory trick would be a silicon quirk?
Rediscovered fix: Binary GCD
Enter the binary GCD (Stein’s) algorithm: a 19th-century idea from China, rediscovered for computers by Josef Stein in 1967. Instead of general division, it uses parity checks and shifts — operations that CPUs do very cheaply. The method leans on a few simple facts (gcd(0,b)=b; gcd(2a,2b)=2·gcd(a,b); gcd(2a,b)=gcd(a,b) when b is odd) and repeatedly strips factors of two and reduces magnitudes without invoking idiv. The result is a surprisingly modern trick: it has been reported that a binary-GCD variant can run roughly 2× faster than the GCC/Clang std::gcd on typical workloads.
Why it matters
This isn’t just academic one-upmanship. In tight numerical loops, cryptography primitives, or systems where division is particularly expensive (embedded CPUs, older silicon), shaving that constant factor matters. That said, your mileage will vary: contemporary processors and compiler intrinsics have improved, and in many real-world cases std::gcd is “fast enough.” Still — the emotional hook here is irresistible: an algorithm older than calculus, dusted off and translated into bit-twiddling, can still outpace a modern standard-library implementation. Not bad for a millennia-old idea, right?
Sources: algorithmica.org, Hacker News
Comments