DuckLake v1.0 — The Lightweight Lakehouse Format Reaches Production-Readiness

DuckLake v1.0 is out, and the team says it’s production-ready. The reference implementation — the ducklake extension for DuckDB — is available today in DuckDB v1.5.2, and the project ships a stable specification, backward-compatibility guarantees, and a roadmap for future work. Want the quick sell? Store your lakehouse metadata in a SQL database, not in a scattered mess of files. Simple. Radical. Practical.
What is DuckLake and why it’s different
DuckLake is a lakehouse format: keep your data on object storage and access it through SQL, much like Delta Lake or Iceberg but with a twist. Instead of metadata in object-storage files, DuckLake puts it in a catalog — any SQL system that supports primary keys and persistent tables will do. The spec defines the metadata tables and types, and the DuckDB ducklake extension implements the whole thing, supporting SQLite, PostgreSQL and DuckDB as catalogs. The project also brought Iceberg compatibility, support for geometry and variant types, and tooling to add existing Parquet files without making deep copies.
Features and real-world use cases
There are some neat engineering choices here. Data inlining lets you stage small updates in the catalog for low-latency streaming; in plain English, you can stream into a lake without heavyweight plumbing. The lightweight setup means you can host a read-only lakehouse with just object storage and a public HTTPS endpoint. In DuckDB specifically, DuckLake enables a “multiplayer” arrangement: multiple DuckDB instances coordinate through a central PostgreSQL catalog. The release also bundles dozens of bugfixes aimed at production robustness and includes migration guides for teams moving from DuckDB to DuckLake.
Adoption, ecosystem and community
It has been reported that the ducklake extension ranks among DuckDB’s top-10 core extensions by downloads. It has also been reported that there are DuckLake clients for Apache DataFusion, Apache Spark, Trino and Pandas; some Pandas work was allegedly produced by “agentic coding.” MotherDuck reportedly offers a hosted DuckLake service, and it has been reported that dozens of companies already run DuckLake in production — there’s even talk of an O’Reilly book in the works. The community has been active: many of the bugfixes and features came from outside contributors, and the project points to an “awesome-ducklake” collection for libraries and use cases. Who knew metadata could bring people together?
Sources: ducklake.select, Lobsters
Comments