Gerard Holzmann’s ten rules were written for code that cannot be allowed to fail. Twenty years later, they map perfectly onto the failure modes of modern distributed systems—and the unique risks introduced by AI code generation.
Gerard Holzmann’s problem at JPL was not a lack of coding standards. It was an excess of them. Hundreds of rules that nobody could remember, that tools couldn’t verify, and that engineers quietly ignored when deadlines hit. His solution: throw out almost everything and keep ten rules — small enough to memorize, strict enough to enforce with static analysis, and targeted at the failure modes that actually kill software.
The original rules were written for C in embedded flight systems. But every principle he captured maps directly onto the failure modes that appear in backend services under load: unbounded retry storms, resource leaks under partial failures, error paths that silently corrupt state, and abstraction layers that make production debugging feel like archaeology.
Add AI code generation to the mix and the problem compounds. Language models are extremely good at generating code that satisfies the happy path. They are systematically bad at the things Holzmann was trying to enforce: bounded behavior, explicit error handling, minimal scope, and legible control flow. The rules below are Holzmann’s original ten, generalized beyond C, extended for distributed systems, and annotated for the specific ways AI tooling tends to violate them.
“When it really counts, it may be worth going the extra mile and living within stricter limits than may be desirable.”
— Gerard Holzmann, NASA/JPL
Control & Structure
01. Keep Control Flow Linear and Traceable
What it is: No deep nesting. No hidden jumps. Execution should read like a sequence of decisions, not a maze. Holzmann banned recursion entirely. In practice, the principle is: every function should have a single, traceable path from entry to exit, with branches that are shallow and explicit.
Why it matters in backend systems: Nested conditionals in request handlers are the primary source of untested edge cases. A function with five levels of nesting has up to 2⁵ = 32 possible execution paths. Your tests cover three of them. The twenty-ninth one fires in production when a downstream service returns a 204 instead of a 200, and your handler silently skips the response body parsing that should have validated the write.
How it fails: An authentication middleware with nested checks for token type, expiry, scope, and user state. The “expired token with admin scope” branch was never tested because no test fixture generates that combination. A crafted request hits it, the condition falls through to a default-allow, and you have a privilege escalation that your logs show nothing about because the branch that would have logged was never reached.
When to bend it: Recursive data structure traversal — tree walking, graph algorithms — is legitimately recursive. The rule isn’t “never recurse.” It’s “ensure the recursion is bounded and the depth is explicit.” A recursive JSON parser with a depth limit of 64 is fine. A recursive template renderer with no depth limit is a DoS vector.
Implementation pattern:
// Flat, explicit, traceable
func handleRequest(ctx context.Context, req *Request) (*Response, error) {
token, err := extractToken(req)
if err != nil {
return nil, fmt.Errorf("token extraction: %w", err)
}
claims, err := validateToken(ctx, token)
if err != nil {
return nil, fmt.Errorf("token validation: %w", err)
}
if err := checkScope(claims, req.RequiredScope); err != nil {
return nil, fmt.Errorf("scope check: %w", err)
}
return executeRequest(ctx, claims, req)
}
Each step is a named function. Each failure path returns immediately with context. No nesting. The call stack at any point in time tells you exactly where you are.
02. Bound Every Loop and Every Retry
What it is: Every iteration needs an explicit ceiling enforced in code, not in documentation. Unbounded loops are load-bearing assumptions that production will eventually invalidate.
Why it matters in backend systems: Retry loops without caps cause the most predictable class of catastrophic outages in distributed systems. A service that retries a failed downstream call indefinitely will — under the right failure conditions — consume all available goroutines, file descriptors, or connection pool slots. The service dies holding locks. Callers retry. You now have a retry storm propagating upstream through your entire dependency graph.
How it fails: A message consumer retries processing on any error, with exponential backoff but no maximum attempt count. A malformed message enters the queue — valid JSON, but with a field that triggers an integer overflow in the processing logic. The consumer retries it forever. Meanwhile, the Kafka partition is stuck. Every other message behind the malformed one is blocked. The consumer group lag climbs. Alerts fire. The on-call engineer spends forty minutes tracing the issue before finding a single message that has been retried 11,000 times since 3am.
When to bend it: Some operations genuinely should retry until success — health check loops in initialization, connection establishment during startup. The rule is not “always give up after N attempts.” It’s “always know what N is and what happens when you reach it.” Move the message to a dead-letter queue. Return an error. Log with full context. Do something explicit.
Implementation pattern:
const (
maxRetries = 5
baseDelay = 100 * time.Millisecond
maxDelay = 10 * time.Second
)
func withRetry(ctx context.Context, op func() error) error {
var lastErr error
for attempt := range maxRetries {
if err := op(); err == nil {
return nil
} else {
lastErr = err
}
delay := min(baseDelay * (1 << attempt), maxDelay)
select {
case <-ctx.Done():
return fmt.Errorf("context cancelled after %d attempts: %w", attempt+1, ctx.Err())
case <-time.After(delay):
}
}
return fmt.Errorf("operation failed after %d attempts: %w", maxRetries, lastErr)
}
The ceiling is a named constant. The delay is bounded. Context cancellation is respected. There is no path through this function that runs forever.
Memory & Resources
03. Declare Resource Lifetime Explicitly
What it is: Every resource you open, you close. Every connection you borrow, you return. Resource lifetime must be explicit and correct on every exit path — not just the happy one. Holzmann’s original rule prohibited dynamic memory allocation after initialization. The generalized form: don’t acquire what you can’t account for under partial failure.
Why it matters in backend systems: Resource leaks in backend services don’t fail immediately. They fail gradually, under load, after hours or days of accumulation. A database connection that leaks on the error path of a transaction handler will exhaust your connection pool exactly when traffic spikes — when you can least afford it, and when the causal distance between the leak and the exhaustion makes the root cause hardest to find.
How it fails: A service opens a database connection, begins a transaction, calls an external API, and on API timeout — returns an error without rolling back or closing the connection. The connection is orphaned. Under normal traffic, the pool is large enough that this never causes visible problems. On Black Friday, request volume triples, the external API’s p99 latency spikes, the timeout rate climbs to 15%, and in forty minutes you’ve leaked your entire connection pool. Every new request blocks waiting for a connection that will never be returned. The service is functionally dead.
When to bend it: Resource pooling intentionally borrows and returns connections on a longer lifecycle than individual requests. The rule applies at the pool level: the pool itself must have bounded size, connection validation, and idle timeout. The connection is still “owned” — ownership just lives longer.
Implementation pattern:
func processWithDB(ctx context.Context, db *sql.DB, id string) error {
conn, err := db.Conn(ctx)
if err != nil {
return fmt.Errorf("acquire connection: %w", err)
}
defer conn.Close() // guaranteed close on every exit path
tx, err := conn.BeginTx(ctx, nil)
if err != nil {
return fmt.Errorf("begin transaction: %w", err)
}
defer func() {
if p := recover(); p != nil {
_ = tx.Rollback()
panic(p) // re-panic after rollback
}
}()
if err := doWork(ctx, tx, id); err != nil {
_ = tx.Rollback()
return fmt.Errorf("do work: %w", err)
}
return tx.Commit()
}
defer conn.Close() fires on every return path. The transaction rollback fires on panic. There is no exit path that leaves a resource open.
04. One Function, One Responsibility
What it is: Each function does exactly one thing — small enough to hold in your head, describable in a sentence without “and.” Holzmann’s limit was roughly 60 lines. The rationale is epistemic: you cannot prove correctness for what you cannot comprehend at once.
Why it matters in backend systems: Large functions are the primary obstacle to safe refactoring, accurate testing, and incident diagnosis. A 300-line handler that parses the request, validates business rules, executes a database transaction, publishes an event, and formats the response cannot be unit tested at any of its components. Every change to the database schema requires understanding the entire function. Every incident requires reading 300 lines to find the one that caused it.
How it fails: A billing handler grows organically over eighteen months. It starts at 80 lines. Features get added inline rather than extracted. It’s now 420 lines, it handles three different payment providers with branching logic, it has two separate database writes that are not transactional with each other, and it sends emails. During an incident, the on-call engineer is trying to determine whether the email was sent before or after the second database write. There is no way to know without reading the function carefully enough to reconstruct its state machine — at 3am, under pressure.
When to bend it: Legitimate exceptions exist: generated code (serializers, protocol implementations), functions that are fundamentally a sequence of steps where extraction adds indirection without clarity. The test: can you name the extracted function accurately? If validateAndTransformAndNormalize() is the honest name, the extraction hasn’t happened yet.
Correctness & Observability
05. Make Assumptions Executable
What it is: Preconditions, postconditions, and invariants belong in the code as executable assertions — not in comments or documentation. Holzmann required a minimum of two assertions per function. The number is less important than the discipline: every implicit assumption should be made explicit and checkable.
Why it matters in backend systems: Silent invariant violations are the hardest class of production bugs. The violation occurs long before the failure. By the time the failure surfaces, the state that caused it is gone. An assertion that fires immediately — at the point of violation, with full context — is worth ten hours of post-hoc debugging.
How it fails: A payment processing function assumes that the amount field is always positive — it’s validated at the API boundary, so it’s “guaranteed.” Three months later, a currency conversion function introduces a rounding error that produces a negative value in a specific locale/denomination combination. The payment function doesn’t check. It writes a negative charge to the database. The database constraint doesn’t catch it (it validates != null, not > 0). A weekly billing job picks it up, tries to charge -$0.01, and Stripe returns a 400. The error is logged. Nobody reads that log. The customer’s account is never charged. Finance notices three months later.
Implementation pattern:
func applyDiscount(price float64, discountPct float64) (float64, error) {
// Make preconditions explicit and loud
if price <= 0 {
return 0, fmt.Errorf("precondition violated: price must be positive, got %v", price)
}
if discountPct < 0 || discountPct > 100 {
return 0, fmt.Errorf("precondition violated: discountPct must be in [0,100], got %v", discountPct)
}
result := price * (1 - discountPct/100)
// Postcondition: result must be non-negative and <= original price
if result < 0 || result > price {
return 0, fmt.Errorf("postcondition violated: result %v out of expected range [0, %v]", result, price)
}
return result, nil
}
06. Never Swallow Errors
What it is: Every failure path must be handled, logged, or propagated. A bare catch {}, an ignored return value, or an error that gets printed and discarded is not error handling — it is active suppression of diagnostic information.
Why it matters in backend systems: Swallowed errors corrupt state silently. The service continues running, returning 200s, doing the wrong thing. By the time the corruption is visible — missing records, wrong balances, phantom writes — the causal chain is months old. The cost of a swallowed error is not the error itself. It’s the debugging time after the fact, multiplied by the number of affected records.
How it fails: A cache write failure is caught and logged at DEBUG level, and the request continues normally. This seems reasonable — the cache is not authoritative. What wasn’t considered: the cache write failure was caused by a serialization error on a specific object type, not a network issue. Every request that hits this object type “succeeds” from the caller’s perspective but never updates the cache. The cache serves stale data indefinitely. Users see data that is months old and cannot explain why. The DEBUG log has 40 million lines.
When to bend it: Some errors are genuinely ignorable — best-effort telemetry, non-critical cache updates — but this must be a deliberate decision, documented at the call site, with a comment explaining why.
// Intentionally non-fatal: metric emission failure doesn't affect correctness.
// If this fails, we lose the data point but the request proceeds.
if err := metrics.Emit(ctx, event); err != nil {
log.Warn().Err(err).Msg("metric emission failed, continuing")
}
The decision to ignore the error is explicit, named, and logged at a visible level. It is not implicit.
07. Minimize State Scope
What it is: Data should live as close to its use as possible. The wider the scope, the more code paths can modify a variable — and the harder it becomes to reason about its value at any point in execution. Holzmann’s principle of least privilege applied to data.
Why it matters in backend systems: Shared mutable state is the primary source of race conditions, unexpected mutation bugs, and test pollution. In concurrent backend services, class-level or module-level state that’s modified across goroutines or threads requires synchronization. Missing synchronization produces data races. Over-synchronization produces contention. Both are correctness problems. Minimizing scope eliminates the need for the synchronization.
How it fails: A connection pool counter is stored as a package-level variable and incremented/decremented without synchronization. In low-traffic testing, the race never manifests — requests are sequential. Under production load, two goroutines read the counter simultaneously, both see N-1 < limit, both proceed, and the pool exceeds its limit. Under Kubernetes memory pressure, the extra connections cause OOM. The pod restarts. The race detector would have caught this in CI if it had been enabled.
Implementation pattern:
// Prefer: pass state explicitly, scope it to the call
func processRequest(ctx context.Context, cfg *Config, store Store, req *Request) (*Response, error) {
// State is parameter, not global. Readable at the call site.
}
// Avoid: reaching for package-level state
var globalConfig *Config // who sets this? when? what happens during hot reload?
08. Make Side Effects Visible at the Call Site
What it is: I/O, mutations, and network calls should be obvious from the function name and call site — not hidden four layers deep in a chain of helpers with innocent-sounding names.
Why it matters in backend systems: Hidden side effects are the most dangerous form of technical debt. A function named formatResponse that also writes to an audit log and updates a cache is a correctness trap. Every caller that invokes it expects formatting. None of them expect writes. When the audit log write fails, the error propagates out of a function that looks like it should never fail, and every caller upstream is unprepared to handle it.
How it fails: A “utility” function named enrichUserData is called throughout a read-heavy API. It was originally just a data transformer. Over time, someone added a cache write to it — seemed reasonable, it already had the data. Then someone added a usage tracking increment. Now a read endpoint that calls enrichUserData is doing two writes on every request, none of which are visible from the call site. Under load, the usage tracking increment (which goes to Redis) creates a hot key. Redis CPU spikes. Latency on the “read” endpoint climbs to 800ms. The on-call engineer sees slow reads and starts tuning the database.
Implementation pattern:
// Explicit: side effects are named and visible
func handleGetUser(ctx context.Context, id string) (*User, error) {
user, err := db.FetchUser(ctx, id)
if err != nil {
return nil, err
}
enriched := enrichUserData(user) // pure transform, no writes
go trackUserAccess(ctx, id) // explicit async write, named
_ = cache.Set(ctx, cacheKey(id), user) // explicit write, intentionally non-fatal
return enriched, nil
}
Abstraction & Indirection
09. Limit Layers of Indirection
What it is: Every layer of abstraction makes it harder to answer: “what actually runs when I call this?” Holzmann restricted pointer indirection to a single level. The generalized principle: each additional layer of middleware, dynamic dispatch, or callback chain increases the cognitive load required to trace execution — and the debugging time required when something goes wrong.
Why it matters in backend systems: In distributed systems, the question “what happens when this call fails?” is already complex. Add four layers of middleware — logging, tracing, auth, rate limiting — each with its own error handling and context propagation, and the answer to that question becomes non-trivial. An error that occurs in middleware layer three may be swallowed by layer two, transformed by layer one, and surface at the caller as a generic 500 with no diagnostic information.
How it fails: A gRPC service wraps handlers in: a panic recovery interceptor, an auth interceptor, a rate limiting interceptor, a tracing interceptor, and a retry interceptor — in that order. The retry interceptor retries on Unavailable. The rate limiting interceptor returns Unavailable when the token bucket is empty. The retry interceptor sees the Unavailable, retries, hits the rate limiter again, retries again, and continues until it exhausts its retry budget — amplifying the load on the rate limiter that was already over capacity. The system is now in a feedback loop. This interaction was invisible because it required understanding the composition of all five interceptors simultaneously.
When to bend it: Framework conventions often require layered middleware. The rule is not “no middleware.” It’s “know what every layer does and how errors compose through them.” Test the error behavior of the full middleware stack, not just individual interceptors in isolation.
10. Treat Warnings as Build Failures
What it is: Zero compiler warnings from day one — not advisory, not “we’ll fix those later,” but a hard gate. Type checkers, linters, and static analyzers should run on every change and block merge on violation.
Why it matters in backend systems: A warning is a future bug that the toolchain has already identified. The cost of fixing it now is a few minutes. The cost of tracking down the bug it becomes in production is hours to days. Every warning left unresolved is a bet that this particular one won’t matter — a bet you are making on behalf of everyone who will be on-call when it does.
How it fails: A Go service runs with -vet disabled because a vendored library triggered a false positive and nobody wanted to investigate. Six months later, a developer uses fmt.Sprintf with a mismatched format verb in an error message — %d with a string argument. Go vet would have caught it. The string is silently converted to its ASCII representation. The error message that should read "user id: abc123" reads "user id: %!d(string=abc123)". A monitoring alert fires on an unexpected error string. An engineer spends three hours investigating a red herring before finding the format bug in a code review.
Implementation pattern:
# .golangci.yml — enforced in CI, fails the build
linters:
enable:
- errcheck # unchecked errors
- govet # correctness issues
- staticcheck # additional static analysis
- gosec # security issues
- exhaustive # exhaustive switch statements
linters-settings:
errcheck:
check-blank: true # flag `_ = someFunc()` patterns
Static analysis is part of the project scaffold — configured before the first line of production code is written, enforced in CI as a hard gate.
How This Applies to Distributed Systems
The rules above were written for single-process software. In distributed systems, every rule has a corresponding failure mode that is harder to detect and harder to recover from.
Bounded behavior (Rules 1, 2) maps directly to retry storms, cascading timeouts, and work amplification. A single unbounded retry loop in one service becomes a coordinated thundering herd when hundreds of instances hit the same failure simultaneously. Bounded loops with jitter are the distributed systems equivalent of Holzmann’s loop ceiling.
Resource accounting (Rule 3) becomes connection pool exhaustion, file descriptor limits, and goroutine leaks. In distributed systems, resources are finite at two levels: within the service (connection pool, goroutine pool) and across the cluster (downstream service capacity). Explicit resource lifecycle applies to both.
Single responsibility (Rule 4) maps to service decomposition. A service that does too many things is impossible to scale independently, impossible to test accurately, and impossible to own clearly. The 60-line function limit is a fractal: it applies within functions, within services, and within system design.
Observable assumptions (Rule 5) maps to contract testing and schema validation in distributed systems. The implicit assumption that “the upstream service will always return this field” is the distributed systems version of an unasserted precondition. It will be violated eventually. Schema validation at service boundaries — Protobuf, JSON Schema, OpenAPI — is the executable assertion for distributed contracts.
Error propagation (Rule 6) becomes structured logging, distributed tracing, and error classification. A swallowed error in a single-process system is bad. A swallowed error in a microservice that removes context before propagating it upstream makes the originating failure impossible to find. Every error that crosses a service boundary must carry enough context to identify its origin without access to the downstream service’s logs.
State scope (Rule 7) maps to idempotency and stateless design. The wider the state scope, the harder it is to reason about correctness under concurrent request processing. Stateless handlers with explicit external state (database, cache) are the distributed systems equivalent of local variable scope.
Side effect visibility (Rule 8) maps to event-driven architecture tradeoffs. A function that “also publishes an event” is a hidden write. In distributed systems, that hidden write may trigger a chain of downstream consequences. Make event publication explicit: it should appear in the handler, not buried in a service layer.
Indirection limits (Rule 9) maps to service mesh and middleware composition. Each middleware layer — service mesh sidecar, API gateway, load balancer — adds latency, adds failure modes, and obscures the path from request to response. Know what every layer does and how errors compose through your full request path.
Static analysis (Rule 10) applies across languages and infrastructure-as-code. Terraform plans with static analysis catch resource configuration errors before they reach production. Kubernetes manifest linting catches RBAC misconfigurations before deployment. The principle applies at every layer of the stack.
Production Failure Case: The Retry Storm
The following is a realistic failure that violates Rules 2 (unbounded loops), 6 (swallowed errors), and 9 (opaque abstraction layers).
Setup: A payment service calls a fraud detection API before processing each transaction. The fraud API is behind an internal HTTP client wrapped in three layers: a circuit breaker, a retry wrapper, and a timeout wrapper. The retry wrapper retries on any 5xx response with exponential backoff. The backoff has no maximum delay. The maximum retry count is 10.
The failure: At 11:47pm on a Wednesday, the fraud detection service experiences a database failover. For ninety seconds, it returns 503 on every request. The retry wrapper in the payment service begins retrying. With exponential backoff starting at 100ms and no delay ceiling, by retry 7 the delay is 6.4 seconds. The payment service is now holding goroutines for 6+ seconds per request, waiting to retry. Incoming payment requests are queuing. Each queued request spawns a goroutine that will also begin retrying.
Cascade: Within four minutes, the payment service has 8,000 goroutines in flight, all waiting on retry timers. Memory usage climbs from 400MB to 3.2GB. The Kubernetes pod OOMs and restarts. On restart, it immediately begins processing the queue — and retrying the fraud API, which is still recovering. Three instances restart in sequence. The fraud API, now recovered, receives a burst of 24,000 requests in thirty seconds and falls over again.
Debugging difficulty: The circuit breaker never opened because its error threshold was configured for sustained errors over 60 seconds — the fraud API returned 503 for only 90 seconds before recovering, just under the threshold. The retry wrapper’s errors were logged at DEBUG level inside the abstraction layer. The payment service’s own logs showed only goroutine count climbing and memory increasing. The causal chain — 503 → retry loop → goroutine accumulation → OOM → cascade — required reading three separate codebases and correlating logs across a twelve-minute window.
What the rules would have prevented:
- Rule 2: A retry cap of 3 with a maximum delay of 2 seconds would have failed fast, surfaced the error to the caller, and prevented goroutine accumulation.
- Rule 6: Logging retry attempts at WARN level with attempt count and upstream status would have made the retry storm visible in real time.
- Rule 9: If the retry wrapper, circuit breaker, and timeout wrapper had been explicit in the calling code rather than transparent abstractions, their composition and interaction would have been visible — and the missing cap would have been caught in code review.
Implementation Patterns
Retry with cap, jitter, and dead-letter handling (Go)
type RetryConfig struct {
MaxAttempts int
BaseDelay time.Duration
MaxDelay time.Duration
Jitter float64 // 0.0 to 1.0
}
var DefaultRetryConfig = RetryConfig{
MaxAttempts: 3,
BaseDelay: 100 * time.Millisecond,
MaxDelay: 5 * time.Second,
Jitter: 0.2,
}
func withRetry(ctx context.Context, cfg RetryConfig, op func() error) error {
var lastErr error
for attempt := 0; attempt < cfg.MaxAttempts; attempt++ {
if err := op(); err == nil {
return nil
} else {
lastErr = err
slog.Warn("operation failed, will retry",
"attempt", attempt+1,
"max_attempts", cfg.MaxAttempts,
"error", err,
)
}
delay := min(
float64(cfg.BaseDelay)*(math.Pow(2, float64(attempt))),
float64(cfg.MaxDelay),
)
jitter := delay * cfg.Jitter * (rand.Float64()*2 - 1)
wait := time.Duration(delay + jitter)
select {
case <-ctx.Done():
return fmt.Errorf("context cancelled after %d attempts: %w", attempt+1, ctx.Err())
case <-time.After(wait):
}
}
return fmt.Errorf("operation failed after %d attempts: %w", cfg.MaxAttempts, lastErr)
}
Resource cleanup with defer and panic safety (Go)
func withTransaction(ctx context.Context, db *sql.DB, fn func(*sql.Tx) error) (err error) {
tx, err := db.BeginTx(ctx, nil)
if err != nil {
return fmt.Errorf("begin transaction: %w", err)
}
defer func() {
if p := recover(); p != nil {
_ = tx.Rollback()
panic(p)
}
if err != nil {
if rbErr := tx.Rollback(); rbErr != nil {
slog.Error("rollback failed", "rollback_error", rbErr, "original_error", err)
}
}
}()
if err = fn(tx); err != nil {
return err
}
return tx.Commit()
}
Explicit error propagation with context wrapping (Rust)
use thiserror::Error;
#[derive(Error, Debug)]
pub enum PaymentError {
#[error("fraud check failed: {0}")]
FraudCheck(#[from] FraudError),
#[error("database write failed: {0}")]
DatabaseWrite(#[from] DbError),
#[error("event publication failed: {0}")]
EventPublication(#[from] EventError),
}
async fn process_payment(
ctx: &Context,
req: &PaymentRequest,
) -> Result<PaymentResponse, PaymentError> {
let fraud_result = fraud_client
.check(ctx, req)
.await
.map_err(PaymentError::FraudCheck)?; // context preserved, type explicit
let record = db
.write_payment(ctx, req, &fraud_result)
.await
.map_err(PaymentError::DatabaseWrite)?;
event_bus
.publish(ctx, PaymentProcessed::from(&record))
.await
.map_err(PaymentError::EventPublication)?;
Ok(PaymentResponse::from(record))
}
Each error type is named. The error path from origin to caller preserves context at every step. The caller can match on error type to decide how to respond. Nothing is swallowed.
Working With AI Tooling
11. You Are the Engineer of Record
AI-generated code has not been tested by someone who cares whether it works. It has been generated by a system optimized to produce code that satisfies the stated requirement on the happy path. The gap between “satisfies the happy path” and “correct under production conditions” is where every rule in this article lives.
The error paths are where AI is weakest. The partial failure case, the timeout during cleanup, the retry that amplifies the failure — these are not in the training signal for task completion. They require experience with what breaks in production, which is exactly what language models don’t have.
Read every line. Trace every error path. Check every resource acquisition for a corresponding release. Treat AI output like a PR from a technically capable contractor who has never been on-call.
12. Specify Failure Modes Before Requesting Implementation
The most effective prompting pattern for backend code is not “implement this feature.” It is “here are the requirements, here are the expected failure modes, here are the edge cases — write tests that cover them first, then implement.”
Specifying failure modes upfront forces the generated implementation to handle them. Without the specification, the implementation will handle the happy path and ignore the rest.
Implement a retry wrapper for HTTP calls with the following requirements:
- Maximum 3 attempts
- Exponential backoff with jitter, capped at 5 seconds
- Context cancellation must abort immediately with no further retries
- Log each retry attempt at WARN level with attempt count and status code
- After exhausting retries, return an error that includes attempt count and last status code
Write failing tests for all of these behaviors first, then implement the function.
This prompt produces a meaningfully different implementation than “implement a retry wrapper.” The tests make the specification executable. They make it visible when the implementation is incomplete. And they force the model to reason about failure modes before writing the code that handles them.
Closing
Holzmann’s paper ends with an analogy: these rules are like a seat belt. Initially uncomfortable. Quickly second nature. Eventually, you can’t imagine not using them.
The engineering case is simpler than the analogy. Rules 1–10 are constraints that eliminate entire categories of production failure. They do not eliminate all failure — distributed systems have failure modes that no coding discipline can prevent. But unbounded retry storms, resource leaks under partial failure, silent error suppression, and race conditions on shared state are all preventable. They are not acts of god. They are the predictable consequences of code that violates identifiable rules.
The AI angle is not a moral argument. It’s an engineering one. AI tools generate code at a rate that exceeds the rate at which humans can review it carefully. The response to that imbalance is not to review less carefully. It is to make the review process systematic — to know exactly what to look for, and to automate as much of the checking as possible.
The rules above are what to look for. The linters and static analyzers are the automation. The rest is engineering judgment applied at the right moment: before the code ships, when fixing it is cheap.
The AI will write the code. You will still be on call at 3am when it breaks.
Build the system that makes it not break.
Adapted from: Holzmann, G.J. (2006). The Power of Ten — Rules for Developing Safety Critical Code. NASA/JPL Laboratory for Reliable Software. Available at spinroot.com.
Original paper
Read Holzmann’s NASA/JPL paper
The embedded document below links to Gerard Holzmann’s original The Power of Ten — Rules for Developing Safety Critical Code.
If the embedded viewer does not load, open the PDF directly.