SQLite: QEMU all over again?

In a trip down memory lane, I recall my experience with QEMU, and how the project changed completely to accommodate a new industry trend. Is SQLite history rhyming all over again?

Glauber Costa
Published in
6 min readOct 4, 2022

--

I first heard of QEMU in the early 2000s. QEMU was the brainchild of Fabrice Bellard. Fabrice is, without any exaggeration, a true genius. Don’t believe me? Those are just some of his achievements from his Wikipedia page:

  • Came up with the formula for calculating the nth digit of pi in base 16.
  • Computed pi to 2.7 trillion digits with a desktop computer for the first time, in 2009.
  • Created FFmpeg, one of the most widely used video encoders.
  • Created QEMU, a fast full-system emulator translating between pretty much every processor architecture.

How emulators worked before QEMU

I first heard of QEMU around 2006. I was into computer architecture from my early days, and most emulators at the time would simply naively translate instructions from into the emulated architecture at runtime.

QEMU, on the other hand, employed a “Tiny Code Generator” to translate instructions through JIT compilation. It wasn’t as fast as running natively, but for a variety of applications it was fast enough and for many use cases it felt like pure miracle. QEMU also had its own emulation for common physical devices you would expect to find, its own disk image format, and much more.

Virtualization shifts the industry

But around that time, something else was brewing. Virtualization was becoming mainstream in the industry.

The idea of virtualization is different from emulation. If you are just trying to run an isolated workload, and not something for a different architecture altogether, you can just execute the instructions natively. This is easier said than done, because processors have privileged instructions that would allow you to access other virtual machines.

But the approach pioneered by the Xen project in 2004, called paravirtualization, suggested that if you are okay with changing the Operating System — since user programs can’t call into privileged instructions anyway — then you can safely run many virtual machines together to get the job done. Later, both Intel and AMD released their own process extensions that would provide a shadow view of the processor’s privileged state and allow solutions like KVM, which later came to power AWS’s Nitro to rise.

Getting QEMU to do Virtualization

Both of those solutions solved the problem of how to execute VM instructions. But a full solution also requires the VMs to have isolated views of devices, like sound cards, graphic cards, disks, etc. QEMU had all that, and was pretty awesome, and everybody wanted to reuse it. But QEMU was a solution for emulation, not virtualization.

QEMU was an open source project, but the community was not very interested in the whole virtualization thing. As a matter of fact, being mostly maintained by volunteers, the community didn’t seem very interested in a lot except for a very narrow set of personal interests. And hey, no judgment here: I too have maintained things as an unpaid volunteer, and I know how hard it is!

But be it as it may, we now had this amazing solution that did almost everything we needed, a lot of other communities building things around it, trying their best to keep things unchanged for ease of upgrade reasons. Until the inevitable happened: we forked QEMU through the qemu-kvm project.

No contributions led to fragmentation

In a sense, because everybody was taking bits and pieces of QEMU to build their device models, QEMU was already forked. That work laid the foundation that enabled QEMU itself (albeit a fork of it), to be extended for the purpose of virtualization by replacing the processor emulation with a native one. This repository became a common one for all parties interested in pushing the envelope of what QEMU could do, and over time, that essentially became the new QEMU: it could still do Emulation as well as before — or better, since many of the improvements were very generic, but now could also do virtualization, opening up a completely new applications for the project.

Is SQLite at a similar fork in the road?

When I look around today, I see a very similar situation developing around SQLite. SQLite is the brainchild of D. Richard Hipp, who was also involved with the Tcl programming language, and his own version control system, among others. Same as Fabrice, an undoubtedly smart and achieved individual.

The code for SQLite is also available, but contributing is even harder than it was at the QEMU days: SQLite is explicitly and unequivocally “Open Source, not Open Contribution”. The few core developers they have do not work with modern tools like git and collaboration tools like Github, and don’t accept contributions, although they may or may not accept your suggestion for a new feature request.

Much like QEMU, new trends in the industry are taking it into a completely new direction: the rise of use cases around Edge compute, due to its limited resources and limited environments means that SQLite fits the bill perfectly. However, edge computing also means that your code will be running in many geographical locations, as close as possible to the user for the better possible latency. Which means that the data that hits a single SQLite instance now needs to be replicated to all the others.

Parallels between the rise of Virtualization and the rise of the Edge.

Many solutions to the problem of how to run SQLite, but with distributed data emerged. A few notable of them are:

  • rqlite: a full blown distributed database similar to CockroachDB, but using SQLite as a storage engine. You talk to it over the wire, and it is no longer an embeddable database.
  • BedrockdB: similar to rqlite, also built around SQLite.
  • dqlite: a combination of SQLite & RAFT, written in C, that keeps the embeddable aspect of SQLite. But because you have to be explicit about the networking events, ORMs like sqlx won’t work and/or have to be adapted.
  • ChiselStore: My own attempt at the problem. Close to dqlite (but in Rust), and ultimately, as I found out, suffers from many of the same problems.
  • LiteFS: as its biggest advantage, it is the fact that it sits behind SQLite, so everything that works with SQLite, works with LiteFS. Something’s gotta give, and under the current limitations of not changing SQLite, their approach is to essentially provide a distributed FUSE filesystem that spreads the writes and deals with the consensus problems. It comes with its own set of issues, many of them raised in this HackerNews thread. I personally think that if distributed filesystems were easy, we’d have a good one by now. But under the assumption that SQLite can’t be changed, it is the best approach by far. Kudos to Ben and the team!

What if SQLite accepted contributions?

Why the assumption that SQLite can’t be changed? Sure, nobody wants to keep patched software around, and nobody wants to require users to install alternative versions of core software like SQLite. This is a classic prisoner’s dilemma situation: if one of us carries a fork of SQLite, that’s a losing proposition in the long term. As the story of QEMU — and frankly many others — teaches us, if we manage to convene together, we all win.

That is why today I am starting, together with some of my peers, libSQL. It is a place for all of us to come together and build a new generation of what SQLite as an embedded database should be. We want to build a strong and independent community, with a clear code of conduct, and if you ever felt like SQLite could change to accommodate the future, you’re more than welcome to come build with us.

There are more details of what we want to achieve on our manifesto. Here are some issues we’d like to tackle:

What would you like to contribute? I’m all ears. You can reach out to us on our Discord community.

--

--