SQLite vs. PostgreSQL: A Production Engineering Decision Framework

The wrong database choice is not a day-one problem. It surfaces at 2 AM when your write queue is backed up and customers are watching a spinner.

SQLite runs on the Mars rover. PostgreSQL runs Instagram. Both accept SQL. Both are free. Both will happily store your users table without complaint.

That is also about where the comparison stops being useful.

The real difference is architectural philosophy — and philosophy has consequences in production. SQLite was built around a constraint: zero administration, zero server process, zero configuration. PostgreSQL was built around a different constraint: correctness and concurrency at scale, no matter the cost of complexity. Every major behavioral difference between these two systems falls out of those original design decisions.

This piece is not a benchmark comparison. It is an attempt to reason clearly about the systems so you can make the right call the first time, or understand why you made the wrong one.

The Architectural Split: File vs. Server

SQLite is a library. You link it into your process and hand it a file path. The “database server” is your application. There is no TCP socket, no authentication handshake, no connection pool, no background daemon consuming RAM. The full state of the database — schema, indexes, all data — lives in a single file that you can cp, scp, rsync, or email to yourself.

PostgreSQL is a server. It forks per-connection processes (historically) or uses a thread-per-connection model depending on configuration, manages a shared buffer pool in memory, coordinates writes through a WAL, and serves results over a wire protocol. You configure it. You monitor it. You run VACUUM on it. You plan for its failure modes.

This distinction matters most when you consider operational surface area. SQLite has essentially none: you back up a file, you restore a file. PostgreSQL requires backup strategy (pg_dump, WAL archiving, or continuous backup via pgBackRest/Barman), replication configuration, connection pooling (PgBouncer is nearly mandatory at production scale), and tuning of a dozen memory parameters. Neither is inherently “better.” One has vastly more surface area to get wrong.

Concurrency: The Decision That Cannot Be Undone Later

This is the dimension that kills you if you get it wrong, because concurrency requirements rarely decrease as a product matures.

SQLite’s Writer Serialization

SQLite’s concurrency model in WAL mode: multiple readers run in parallel, but exactly one writer holds the lock at a time. If two processes attempt concurrent writes, one blocks until the other commits or the busy-timeout fires.

In practice this means:

Writer A: BEGIN; UPDATE accounts SET balance = 500 WHERE id = 1; -- holds WAL lock
Writer B: BEGIN; UPDATE orders SET status = 'paid' WHERE id = 99; -- WAITS
Writer B: [after busy_timeout_ms elapses] → SQLITE_BUSY

For most applications this is fine. The “one writer at a time” constraint is only catastrophic when the write rate is high enough that the queue builds faster than it drains. At what point? It depends heavily on write size and storage latency, but on NVMe with small transactions, SQLite in WAL mode handles 10,000–50,000 writes/second before contention becomes a bottleneck. Most web apps generating a million page views a day are nowhere near that ceiling.

The failure mode is subtle: under moderate concurrent load, SQLite doesn’t degrade smoothly. It starts returning SQLITE_BUSY errors to callers that are not prepared to handle them, or it queues writes inside a single process, inflating response latency invisibly until something upstream (an HTTP timeout, a queue backpressure limit) trips.

// Correct SQLite write handling in Go — most tutorials omit this
db, _ := sql.Open("sqlite3", "file:data.db?_journal_mode=WAL&_busy_timeout=5000")

// _busy_timeout=5000 tells SQLite to retry for 5 seconds before returning SQLITE_BUSY
// Without this, concurrent writes in a multi-goroutine server return errors immediately

PostgreSQL’s MVCC

PostgreSQL’s Multi-Version Concurrency Control is one of the more elegant solutions in database engineering. The core principle: instead of locking a row during a write, create a new version of the row and let the old version persist for any reader who needs it.

Every transaction receives a transaction ID (XID). Every row tuple stores xmin (the XID that created it) and xmax (the XID that deleted or updated it). When you read a row, PostgreSQL’s visibility rules check whether that version of the row was committed before your transaction started. You see a consistent snapshot of the database frozen at transaction start. Other writers creating new versions of those rows are invisible to you.

The consequence is that reads never block writes and writes never block reads. This is why PostgreSQL can absorb thousands of simultaneous connections performing mixed read-write workloads without writers queuing behind readers.

The cost of MVCC is storage bloat: every update produces a new tuple, and old tuples linger until VACUUM determines no active transaction can see them. A table with high update rates accumulates “dead tuples” at a rate proportional to write throughput. If autovacuum falls behind — which happens when autovacuum_cost_delay is misconfigured or table bloat causes sequential scans that slow down vacuum itself — table bloat compounds until query plans degrade and storage fills.

WAL Mechanics: Same Name, Fundamentally Different Purpose

Both databases have a Write-Ahead Log. The implementation and operational role differ enough that shared terminology is more misleading than clarifying.

SQLite WAL: Local Concurrency Buffer

SQLite’s WAL (PRAGMA journal_mode=WAL) solves a local coordination problem: how to let readers see a consistent committed state while a writer appends to the log without blocking them.

The mechanism: writes append to a .db-wal file rather than modifying the main database in-place. Readers check the WAL for their page’s most recent committed version, falling back to the main file if the page isn’t in the WAL. A checkpoint operation periodically writes WAL frames back into the main database file.

database.db          ← stable committed state (readers use this)
database.db-wal      ← recent writes (readers check here first)
database.db-shm      ← shared memory index into the WAL (coordination)

This is entirely a single-machine, single-file construct. There is no streaming, no replication, no remote consumer. The WAL exists to coordinate local processes.

PostgreSQL WAL: The Backbone of Durability and Replication

PostgreSQL’s WAL is mandatory infrastructure, not an optional mode. Every modification — every INSERT, UPDATE, DELETE, DDL statement — writes to the WAL before touching data files. The sequence is strict: WAL write confirmed durable → data files modified. This ordering is the guarantee.

The WAL is the mechanism for:

Point-in-time recovery. You can take a base backup at 9 AM, archive WAL segments continuously, and restore the database to any second between 9 AM and now.

Streaming replication. Replica servers connect to the primary and receive the WAL stream in real time, replaying it to stay current. Replicas can lag by milliseconds under normal load. Synchronous replication modes (synchronous_commit = on) ensure a write is acknowledged only after at least one replica has confirmed WAL receipt, providing zero-RPO failover at the cost of latency.

Replication slots. Logical replication slots allow downstream consumers (separate databases, Kafka via Debezium, custom consumers) to receive a decoded stream of row-level changes. This is how Change Data Capture (CDC) pipelines are built on PostgreSQL.

Type System: Flexibility vs. Correctness

SQLite uses affinity-based typing: columns have a preferred type, but SQLite will store any value in any column and attempt silent conversion. This is occasionally useful for schema migrations and always dangerous for data integrity.

-- SQLite silently accepts all of this
CREATE TABLE readings (sensor_id INTEGER, value REAL, recorded_at TEXT);
INSERT INTO readings VALUES ('not-an-integer', 'not-a-float', 12345);
-- No error. Stored as TEXT, TEXT, INTEGER.

PostgreSQL enforces strict type constraints at write time. A column declared TIMESTAMPTZ will reject anything that isn’t a valid timestamp. This makes data migrations more effort upfront and eliminates an entire class of silent data corruption downstream.

Beyond standard types, PostgreSQL’s type system is a genuine differentiator in schema design:

JSONB. Binary-encoded JSON with full indexing support. GIN indexes let you query into arbitrary JSON structures efficiently:

CREATE TABLE events (
    id          BIGSERIAL PRIMARY KEY,
    occurred_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    payload     JSONB NOT NULL
);

-- Index the entire JSONB column for containment queries
CREATE INDEX idx_events_payload ON events USING GIN (payload);

-- Index a specific path for range queries
CREATE INDEX idx_events_user ON events ((payload->>'user_id'));

-- Containment query — uses the GIN index
SELECT id FROM events
WHERE payload @> '{"type": "purchase", "status": "completed"}';

-- Range query on extracted field — uses the expression index
SELECT id FROM events
WHERE payload->>'user_id' = '42'
  AND occurred_at > NOW() - INTERVAL '7 days';

This is why teams building audit logs, activity feeds, or event stores often reach for PostgreSQL over a dedicated document store: you get relational transactions and document-style flexible schemas in the same engine, with full ACID guarantees spanning both.

Arrays. Native array columns with array-specific operators and GIN indexing:

-- Store and query multi-valued attributes without a junction table
CREATE TABLE articles (
    id    BIGSERIAL PRIMARY KEY,
    title TEXT NOT NULL,
    tags  TEXT[]
);

CREATE INDEX idx_articles_tags ON articles USING GIN (tags);

SELECT id, title FROM articles
WHERE tags @> ARRAY['postgresql', 'backend'];

Network types. INET, CIDR, MACADDR with range operators. You can store IP address ranges and query whether an IP falls within a CIDR block natively, without string parsing in application code.

How This Applies to Distributed Systems

Idempotency

SQLite’s single-file model makes idempotent writes straightforward within one process. PostgreSQL adds complexity because writes arrive from many concurrent connections. INSERT ... ON CONFLICT DO UPDATE (upsert) and unique constraint handling become critical correctness tools:

-- PostgreSQL upsert: idempotent job queue acknowledgment
INSERT INTO job_completions (job_id, completed_at, result)
VALUES ($1, NOW(), $2)
ON CONFLICT (job_id) DO UPDATE
    SET completed_at = EXCLUDED.completed_at,
        result = EXCLUDED.result
WHERE job_completions.completed_at IS DISTINCT FROM EXCLUDED.completed_at;

Retries and Backoff

PostgreSQL serialization failures (ERROR: could not serialize access due to concurrent update) are expected under REPEATABLE READ or SERIALIZABLE isolation. Application code must detect them and retry:

func execWithRetry(ctx context.Context, db *sql.DB, fn func(*sql.Tx) error) error {
    const maxRetries = 5
    for attempt := 0; attempt < maxRetries; attempt++ {
        tx, err := db.BeginTx(ctx, &sql.TxOptions{Isolation: sql.LevelSerializable})
        if err != nil {
            return err
        }
        if err := fn(tx); err != nil {
            tx.Rollback()
            // Check for serialization failure (PostgreSQL error code 40001)
            if isSerializationError(err) && attempt < maxRetries-1 {
                backoff := time.Duration(1<<attempt) * 10 * time.Millisecond
                time.Sleep(backoff)
                continue
            }
            return err
        }
        return tx.Commit()
    }
    return errors.New("max retries exceeded")
}

SQLite’s equivalent is SQLITE_BUSY on write contention. The _busy_timeout pragma handles short contention; longer contention signals an architectural problem (multiple writers that should be serialized at the application layer, or a workload that has outgrown SQLite).

Observability

SQLite produces minimal observable state — you can examine the database file size and run PRAGMA wal_checkpoint to observe WAL activity, but there is no equivalent of PostgreSQL’s pg_stat_activity, pg_stat_bgwriter, pg_locks, or pg_stat_user_tables.

PostgreSQL exposes rich runtime introspection:

-- Active queries with duration (find long-running queries)
SELECT pid, now() - query_start AS duration, state, query
FROM pg_stat_activity
WHERE state != 'idle'
  AND query_start < NOW() - INTERVAL '30 seconds'
ORDER BY duration DESC;

-- Bloat estimate per table
SELECT schemaname, tablename,
       n_dead_tup, n_live_tup,
       round(n_dead_tup::numeric / NULLIF(n_live_tup + n_dead_tup, 0) * 100, 2) AS dead_pct,
       last_autovacuum
FROM pg_stat_user_tables
ORDER BY dead_pct DESC NULLS LAST;

-- Replication lag
SELECT client_addr, state,
       pg_wal_lsn_diff(pg_current_wal_lsn(), sent_lsn) AS send_lag_bytes,
       pg_wal_lsn_diff(sent_lsn, replay_lsn)           AS replay_lag_bytes
FROM pg_stat_replication;

This observability is a major operational advantage at scale and one of the underrated reasons teams choose PostgreSQL over simpler alternatives.

Resource Limits and Connection Pressure

PostgreSQL has a hard max_connections limit (default: 100). Each connection spawns a backend process consuming roughly 5–10 MB of memory. At 500 connections, you are consuming 2.5–5 GB just for connection overhead before a single query runs.

This is why PgBouncer (connection pooler) is effectively mandatory in production web deployments. In transaction-mode pooling, hundreds of application connections multiplex through tens of actual PostgreSQL connections:

Application: 500 concurrent connections
PgBouncer:   500 → pool → 20 PostgreSQL backend processes
PostgreSQL:  20 connections, full concurrency within those 20

SQLite has no equivalent problem — there is no server process, no connection limit, no pooler to configure. For embedded, edge, and mobile use cases, this is a genuine advantage, not a limitation.

Production Failure Case: The Silent SQLite Timeout

Here is a failure pattern that is common, painful to diagnose, and entirely avoidable.

Setup: A team builds a SaaS product. Early traction, moderate traffic. They choose SQLite for operational simplicity — a reasonable call. The application is a Go HTTP server with goroutines handling concurrent requests. They use the standard database/sql interface.

What goes wrong: The database/sql pool defaults open multiple connections to SQLite. SQLite in WAL mode allows one writer. With multiple goroutines issuing concurrent writes, SQLite returns SQLITE_BUSY almost immediately (default busy timeout: 0ms). The database/sql driver surfaces this as a Go error.

The developer never tested concurrent writes under load. In development, requests arrive sequentially. In production, a moderate traffic spike (50 concurrent users submitting forms) causes a write storm. Half the writes fail silently because the error handling in the HTTP layer swallows the error and returns HTTP 200 with an empty body. Users see success. Data is lost.

How to avoid it:

// 1. Open the database with busy_timeout set
db, err := sql.Open("sqlite3",
    "file:prod.db?_journal_mode=WAL&_busy_timeout=10000&_synchronous=NORMAL")

// 2. CRITICAL: Constrain the connection pool to 1 writer
// Use separate read and write pools, or set MaxOpenConns=1
db.SetMaxOpenConns(1)

// 3. Handle errors explicitly — never ignore sql.ErrNoRows or driver errors
result, err := db.ExecContext(ctx, "INSERT INTO orders ...")
if err != nil {
    // log, return 500, do not pretend this succeeded
    return fmt.Errorf("order insert failed: %w", err)
}

Setting MaxOpenConns(1) on the write database serializes all writes at the connection pool level, eliminating SQLITE_BUSY on writes. Reads can use a separate pool with MaxOpenConns set to whatever the read concurrency requires.

The Edge Deployment Shift

For years the conventional guidance was unambiguous: SQLite for embedded and mobile, PostgreSQL for servers. Modern NVMe storage and edge computing have complicated that picture.

NVMe SSDs deliver 500,000–1,000,000 random IOPS at sub-100 microsecond latency. SQLite in WAL mode on NVMe handles 10,000–50,000 writes per second per instance — enough for the median SaaS product at its median traffic day. The failure mode (SQLITE_BUSY under concurrent writes) is well-understood and preventable.

The edge argument is latency:

Cloudflare D1 and Turso both distribute SQLite databases globally with per-region replicas. Writes go to a primary and propagate to read replicas with eventual consistency. For read-heavy workloads — which describes most content-serving applications — this topology produces latency profiles that no single-region PostgreSQL deployment can match.

The tradeoff is consistency. D1 and Turso replicas are eventually consistent. If your application reads data immediately after writing it and requires reading your own writes from any region, you need either sticky sessions, write-through logic, or PostgreSQL with synchronous replication.

Implementation Patterns

PostgreSQL: Safe Upsert with Optimistic Locking

-- Optimistic locking with version column
-- Application reads version, increments, write fails if concurrent update occurred
UPDATE documents
SET    content = $1,
       version = version + 1,
       updated_at = NOW()
WHERE  id = $2
  AND  version = $3;  -- must match what we read

-- If rowsAffected == 0: concurrent modification, retry or surface conflict to user

SQLite: Read-Write Pool Separation in Go

type DB struct {
    read  *sql.DB
    write *sql.DB
}

func Open(path string) (*DB, error) {
    dsn := fmt.Sprintf("file:%s?_journal_mode=WAL&_synchronous=NORMAL", path)

    write, err := sql.Open("sqlite3", dsn+"&_busy_timeout=10000")
    if err != nil { return nil, err }
    write.SetMaxOpenConns(1)          // serialize writes
    write.SetMaxIdleConns(1)

    read, err := sql.Open("sqlite3", dsn+"&_busy_timeout=5000")
    if err != nil { return nil, err }
    read.SetMaxOpenConns(max(4, runtime.NumCPU())) // parallel reads

    return &DB{read: read, write: write}, nil
}

PostgreSQL: Bounded Retry on Serialization Failure

func isSerializationError(err error) bool {
    var pgErr *pgconn.PgError
    return errors.As(err, &pgErr) && pgErr.Code == "40001"
}

func execSerializable(ctx context.Context, db *pgxpool.Pool,
    fn func(pgx.Tx) error) error {

    for attempt := 0; attempt < 5; attempt++ {
        tx, err := db.BeginTx(ctx, pgx.TxOptions{IsoLevel: pgx.Serializable})
        if err != nil { return err }

        if err = fn(tx); err != nil {
            _ = tx.Rollback(ctx)
            if isSerializationError(err) {
                time.Sleep((1 << attempt) * 10 * time.Millisecond)
                continue
            }
            return err
        }
        return tx.Commit(ctx)
    }
    return errors.New("serializable transaction failed after 5 attempts")
}

Decision Framework

Neither database is the correct default. The question is which constraints apply to your workload.

Reach for SQLite when:

Your application has one logical writer (a mobile device, a single embedded process, an edge function with a per-user database). Operational simplicity is a feature, not a compromise — you have no DBA, no ops team, and paying for managed PostgreSQL is wasteful at your scale. Traffic is read-heavy with infrequent or bursty writes. You are deploying to the edge and need sub-10ms reads.

Reach for PostgreSQL when:

Multiple application instances write to the same dataset concurrently. You need a verifiable audit trail with point-in-time recovery. Your schema uses complex types (JSONB, arrays, custom domains) that benefit from strict enforcement. You are building multi-tenant software where tenant data lives in shared tables. You need streaming replication, logical CDC, or downstream consumers that subscribe to a change stream.

The rule you can break:

“SQLite is not for production web apps.” It is, with appropriate architecture: a single writer pool, WAL mode enabled, busy_timeout set, and a workload that does not require distributed writes. Pieter Levels runs Nomad List and Remote OK — products with hundreds of thousands of monthly users — on SQLite. The constraint is not traffic volume; it is concurrent write volume.

“You always need a connection pooler with PostgreSQL.” Not if you are using a managed service like Supabase, Neon, or RDS Proxy that handles pooling at the infrastructure layer, or if you are deliberately limiting your application’s connection count through application-level queuing.

The Real Summary

SQLite eliminates operational complexity by accepting a single-writer constraint.
PostgreSQL accepts operational complexity in exchange for eliminating almost every other constraint.

If your write concurrency is low and your deployment environment is constrained — mobile, embedded, edge, small team with no ops budget — SQLite is not a compromise. It is the correct choice.

If your write concurrency is high, your data model is complex, your team needs observability and replication, or your organization requires point-in-time recovery for compliance — PostgreSQL is the correct choice, and the operational overhead is the price of those guarantees.

The mistake is not choosing one over the other. The mistake is choosing without examining the actual constraints first.