31 Dec 2023
How fast is SQLite?
Plenty fast
I have recently been on a path towards radical simplification of the systems I build. Forgoing Spring Boot for a simple Axum Rust backend is a simple choice for me. Choosing a database is an entirely different story. I have been using PostgreSQL successfully, but operating it has been a bit of a black box. Sure, I just spin up a standard docker container and it works, but I wouldn’t want to start straying from the default configuration considering the absolute wealth of features and manual pages there are. So I started looking into SQLite.
Advantages
SQLite is an embedded database, a library if you will. Compared to dedicated DB servers that need to be operated separately, this simplifies the deployment significantly. Additionally, the latency of queries is extremly low, as they are more function call than networked request. For incremental backups, there is the excellent Litestream project that observes the write-ahead log and pushes changes up to an S3 compatible object storage. Right up my alley.
Performance
I conducted some microbenchmarks to gauge the performance of SQLite. There is one table that is declared as follows:
CREATE TABLE metrics(bucket TEXT NOT NULL, date TEXT NOT NULL, data TEXT NOT NULL);You might notice that the motivating application to try SQLite is a home-grown metrics
collection system. For one scenario I also add indices on bucket and date. Next,
I ran a bunch of (hopefully) self explanatory scenarios and measured the execution time of the
queries in isolation. Read queries are prepared statements of the form:
SELECT * FROM metrics WHERE bucket = ?1 AND rowid = ?2;Notice that rowid is an implicit column added by SQLite. Write queries:
INSERT INTO metrics (bucket, date, data) VALUES (?1, ?2, ?3);The benchmarks were run on my local macbook, as well as a linux ARM server and a linux x86 server on Hetzner.
Results
| Scenario | Throughput | 90 percentile duration | 
|---|---|---|
| Vanilla (read) | 183’150/s | 6us | 
| Vanilla (write) | 4’363/s | 248us | 
| WAL Mode (read) | 454’338/s | 3us | 
| WAL Mode (write) | 14’401/s | 37us | 
| WAL + Synchronisation Normal (read) | 483’558/s | 3us | 
| WAL + Synchronisation Normal (write) | 113’684/s | 8us | 
| WAL + In-memory (read) | 50’000’000/s | 0us | 
| WAL + In-memory (write) | 981’836/s | 1us | 
| WAL + Index (read) | 440’917/s | 3us | 
| WAL + Index (write) | 47’359/s | 25us | 
| WAL + Index (mixed, 80% read) | 197’012/s | 15us | 
| WAL + Index (mixed 80% write) | 59’111/s | 16us | 
Table 1: M1 Macbook Air results
| Scenario | Throughput | 90 percentile duration | 
|---|---|---|
| Vanilla (read) | 94’786/s | 12us | 
| Vanilla (write) | 560/s | 2’463us | 
| WAL Mode (read) | 238’322/s | 6us | 
| WAL Mode (write) | 3’316/s | 585us | 
| WAL + Synchronisation Normal (read) | 275’406/s | 5us | 
| WAL + Synchronisation Normal (write) | 46’512/s | 18us | 
| WAL + In-memory (read) | 943’396/s | 1us | 
| WAL + In-memory (write) | 299’976/s | 4us | 
| WAL + Index (read) | 186’358/s | 7us | 
| WAL + Index (write) | 19’479/s | 46us | 
| WAL + Index (mixed, 80% read) | 75’153/s | 24us | 
| WAL + Index (mixed 80% write) | 22’801/s | 42us | 
Table 2: Linux Ampere ARM results (Hetzner CAX31)
| Scenario | Throughput | 90 percentile duration | 
|---|---|---|
| Vanilla (read) | 73’800/s | 14us | 
| Vanilla (write) | 925/s | 1’484us | 
| WAL Mode (read) | 332’225/s | 4us | 
| WAL Mode (write) | 5’542/s | 211us | 
| WAL + Synchronisation Normal (read) | 315’756/s | 4us | 
| WAL + Synchronisation Normal (write) | 80’145/s | 12us | 
| WAL + In-memory (read) | 952’380/s | 1us | 
| WAL + In-memory (write) | 322’695/s | 3us | 
| WAL + Index (read) | 315’357/s | 4us | 
| WAL + Index (write) | 26’469/s | 34us | 
| WAL + Index (mixed, 80% read) | 121’703/s | 15us | 
| WAL + Index (mixed 80% write) | 36’260/s | 26us | 
Table 3: Linux AMD x86 results (Hetzner CPX31)
Observations
Notice the increase of queries per second across the board once write-ahead log (WAL) mode is
turned on and Synchronization is set to Normal. The 90th percentile duration column is
indicated in microseconds! Even the slow queries are still incredibly fast. The most
“real-world” scenario here is probably WAL + Index (mixed, 80% read). We are looking at
a QPS in the ballpark of 100'000 regardless of platform. Plenty fast for my usecases.
Limitations
I found it hard to get any speedup when trying to access a single SQLite DB from multiple
threads. Writes are single threaded by nature, not much of a debate there, but reads (at least
in WAL mode) should scale well. It’s probably down to the library I use for connection
pooling. But considering single threaded performance is already so good, I didn’t spend to
much time here. In Rust lingo, I just use an Arc<Mutex<Connection>> to protect
a single connection from concurrent access.