Haskell linter xreferee gets a 5× boost by ditching Text for ByteString and using memchr

The fix
xreferee — a Haskell tool that checks @(ref:foo) references against #(ref:foo) anchors in a git repo — got noticeably faster after a tiny but sharp refactor. It has been reported that Brandon Chinn replaced LazyText with LazyByteString and swapped a Text.break-based search for two calls to LBS.elemIndex (which uses the C memchr under the hood). The code change is small. The payoff? Big, especially in the worst case.
The linter relies on git grep -z output: NUL-separated filepath, line, column and the full line contents. In the common case lines are short and a Text-based parser barely broke a sweat (only ~10% speedup seen). But for degenerate inputs — think single lines 25,000 characters long with multiple markers — the old break approach spent a lot of time doing byte-by-byte equality checks. Replacing that with two memchr-backed elemIndex calls (one for '@', one for '#') and picking the nearest match cut that hotspot down dramatically.
Why it matters
Why should you care? Because this is a classic systems lesson: decode only what you must, and use the right primitive. ByteString avoids UTF-8 decoding for most dropped bytes and memchr is highly optimized in C, so two fast scans beat one slow Haskell loop that checks equality repeatedly. Sound familiar? Tools like ripgrep have leaned on similar tricks for years — low-level C routines can still be the secret sauce.
Performance tinkering can feel like nitpicking. But when a few lines of code turn a painful edge case into something pleasant, it stings in a good way. Chinn reports up to a 5× speedup in pathological cases and modest gains elsewhere — proof that sometimes the simplest changes deliver the biggest wins.
Sources: brandonchinn178.github.io, Lobsters
Comments