Binary encodings for JSON and VARIANT promise big speedups

April 18, 2026
Detailed view of Ruby on Rails code highlighting software development intricacies.
Photo by Digital Buggu on Pexels

Benchmarks: parsing, not JSON, is the drag

A new deep dive by Jin Cong Ho argues that the real cost in JSON-heavy systems isn’t the text size — it’s parsing. Using simdjson’s on-demand API, the post measures parse+query times and finds a realistic Twitter-like query taking about 136,784 ns. Scale that to a million rows and you’re looking at roughly two and a half minutes of CPU time — enough to make a cup of coffee. Ouch. Who would have thought a little parsing could quietly eat so many cycles?

Binary encodings cut the parsing out of the hot path

The punchline: binary encodings can avoid much of that work and give repeated queries a huge speed advantage. It has been reported that a carefully designed minimal binary layout can yield dramatic improvements — the post’s TOC even touts “2,346x Faster Lookup” — though that specific figure should be taken as the author’s benchmark claim rather than an industry-wide rule. The write-up walks through object and array microbenchmarks, visual layouts, and why BSON doesn’t necessarily solve the random-access problem for analytical workloads.

Trade-offs and where VARIANT fits in

This isn’t just a performance stunt; it’s a design conversation. The post maps out the trade-offs developers face: storage and serialization cost, random access and key ordering, and richer type handling. It has been reported that VARIANT-style encodings (and Parquet-adjacent designs) could ultimately supplant raw JSON in some databases — allegedly because they balance fast lookups and compact storage better for repeated queries.

Why you should care

If you run analytics or a database that frequently peeks into nested JSON, this matters. Binary formats aren’t a silver bullet — there are compatibility, tooling, and type semantics to wrestle with — but the potential to turn minutes of CPU into milliseconds is hard to ignore. Read the full post for the microbenchmarks and layout sketches; the devil is in the encoding details, and that’s where performance lives or dies.

Sources: jincongho.com, Hacker News