Sorting Performance Rabbit Hole
The setup
It has been reported that Pystd — a small sorting library — was initially 5–10% slower than stdlibc++ on a 10 million 64-bit-integer shuffled dataset. The obvious nerd snipe followed: can Pystd be made faster than the well-tuned std::sort and std::stable_sort? The author dug in, tried a string of micro-optimizations, and kept the input order identical across runs to make comparisons fair.
Stable sort: an easy win
Surprisingly, stable sort yielded a quick victory. A few “obvious tweaks” trimmed the runtime to about 0.86 seconds, reportedly making Pystd roughly 5% faster than std::stable_sort. Sometimes the low-hanging fruit is actually fruit — who knew? The morale boost was real. Onwards to the nastier sibling: unstable sort.
Unstable sort: a stubborn beast
Unstable sorting turned into a proper rabbit hole. The author allegedly examined stdlibc++’s implementation and tried copying its tricks — insertion-by-temp, memmove, alternative pivot selection, even shell and radix variants — and most changes either slowed things or made no measurable difference. Debugging a large-data bug led to experimenting with the introsort cutoff from 16 down to 8 and then, crucially, up to 32. That threshold change produced the biggest single improvement; 64 helped more but had negative spillover elsewhere, so 32 was chosen as the compromise.
A win by a whisker — and what it means
It has been reported that after the final tweaks Pystd’s best observed time was 0.754 seconds versus stdlibc++’s 0.755 seconds — a one-thousandth-of-a-second lead, and it happened only once. Big? No. Telling? Absolutely. The story isn’t about a decisive rout; it’s about where real-world gains often hide: tuning thresholds and corner-case behavior rather than headline-grabbing new algorithms. Tiny victories like this feel glorious — a photo finish at the microbench — but they also remind us that performance is messy, brittle, and delightfully nerdy.
Sources: nibblestew.blogspot.com, Hacker News
Comments