Floyd's Sampling Algorithm

It has been reported that Justin Jaffray revisited a neat bit of algorithmic lore in a recent newsletter: Robert Floyd's sampling algorithm for choosing a uniformly random k-subset of {1,…,n}. Short, elegant, and a little sneaky — the routine picks indices from n−k+1 up to n, drawing a random t in [1..i] and either inserting t or i into the sample depending on whether t is already chosen. Simple code, but it reads like a magic trick. Why does that tiny branch produce a perfectly uniform sample? Good question.
The algorithm, in plain English
Think of it as building the sample from the top down. For each i in the tail range, you grab a random index t between 1 and i; if t is already in the set you keep i, otherwise you keep t. Unlike reservoir sampling, which feels like streaming logic, Floyd’s method is compact and non-intuitive — the branch is what makes a clean combinatorial intuition awkward, and that’s part of the charm. Jaffray gives a short, rigorous argument showing that each k+1-set has exactly k+1 preimages under the stepwise map, which forces uniformity.
Two perspectives that make the trick less spooky
If the combinatorial count doesn’t land, there’s another lens: Fisher–Yates shuffle. Take the “upward” Fisher–Yates that builds a permutation by swapping each new element with a random earlier position. Pull off the last k elements of a freshly shuffled array and you have a uniform sample. Floyd’s loop is just doing those final k swaps directly. Either an element swaps with someone in the tail and stays, or it swaps with someone outside and is replaced — exactly the two branches. That click — seeing a sampling algorithm as a tiny, directed shuffle — is the emotional payoff. Neat, tidy, and somehow still delightful.
Sources: buttondown.com/jaffray, Hacker News
Comments