Frappe traces mysterious MariaDB freezes to a metadata query that opens hundreds of table files

April 20, 2026
High-resolution macro photograph of intricate fractal ice patterns on a frozen surface, showcasing winter textures.
Photo by Suki Lee on Pexels

The problem

It has been reported that Frappe Cloud — which hosts thousands of sites — began seeing repeated, sudden MariaDB freezes that knocked entire database servers unresponsive. Not a slow creep; a spike, then blackout. Weekly metrics showed 5–6 incidents on both shared and dedicated servers. When the system goes dark, monitoring dies with it: SSH, exporters, and metric collection all stall just when you need them most. Frustrating? Absolutely. Scary? You bet.

The investigation

Frappe engineers used MariaDB’s process list for a first look, then reached for eBPF to get kernel-level visibility. They wrote a small eBPF tracer that pairs entry and exit hooks (for vfs_read, etc.) to spot stalled I/O — if a read starts and never returns, the disk is struggling. Because keeping hooks on all the time adds overhead, they made the probes on-demand: a resource monitor watches for spikes in CPU iowait or disk activity and only then attaches the eBPF probes. Clever. It’s like placing a camera only when the alarm goes off.

The cause and takeaway

Traces repeatedly showed a seemingly innocent query reading information_schema.tables stuck in “Opening tables.” It has been reported that this query isn’t just reading tiny metadata — MariaDB opens each table’s .ibd file and reads a few pages (roughly ~4 pages ≈ 64 KB) from each tablespace header/index metadata area. On a typical Frappe + ERPNext database with ~700 tables, that’s hundreds of parallel I/O operations. Multiply that by user queries, background jobs and backups, and the disk I/O queue fills fast. Why only occasional spikes? Allegedly, because MariaDB caches those stats and cache invalidation (DDL, table flushes, server events) forces fresh reads at unlucky moments. The lesson: some “simple” metadata queries can behave like a denial-of-service against your own disk subsystem — and when monitoring can’t see the problem, kernel-level tools like eBPF can be the flashlight that saves the day.

Sources: frappe.io, Lobsters