Consistency Models

1. What this document is about

This document explains consistency models in distributed systems — the guarantees about the ordering and cisibility of operation across multiple replicas of data

Where this applies:

Systems where data exists in multiple locations (caches, read replicas, partitioned databases, event streams)
Architectures using message brokers for asynchronous communication
Services that replicate state across boundaries (microservices, CDNs, multi-region deployments)
Any system where writes and reads may occur against different copies of data

Where this does not apply:

Single-server applicaiton with no replication
Systems where all operations go through a single authoritative source with no caching
Purely stateless services that don't manage data consistency

When multiple components hold copies of the same logical data, you must explicitly choose what guarantees about order and visibility you're willing to provide and enforce. This choice determines your system's correctness, performance, availability and operational complexity.

2. Why this matters in real systems

Consistency models become unavoidable when you introduce replication — and you introduce replication constantly without realizing it.

Common triggers:

You add Redis caching in front of your database to reduce local. A user updates their profile in SQL, but the API still returns stale data from cache. Your cache invalidation logic has a race condition. Customer support receives angry emails.
You split a monolith into services. Order Service publishes an OrderPlaced event. Inventory Service and Shipping Service both consume it, but process at different speeds due to different loads. Shipping Service starts preparing a package before Inventory Service has marked items as reserved.You ship products you don't have.
You add read replicas to scale database reads. A user posts a comment, gets a success response, refreshes the page, and their comment is missing because they hit a replica that hasn't caught up yet. They post again. Now you have duplicate comments.
You partition your database for horizontal scaling. Two users simultaneously book the last seat on a flight, hitting different partitions. Both get confirmations. One arrives at the airport to discover they don't have a seat.

What breaks when you ignore consistency models:

Lost updates: concurrent writes overwrite each other with no conflict detection Phantom reads—queries return different results when re-executed immediately.
Violation of business invariants: you sell more inventory than exists, double-change customers, or allow conflicting reservations
Cascade failures: services make decisions on stale data, propagating errors downstream.
Undebuggable behavior: operations appear to execute out of order, making root cause analysis impossible

Why simpler approaches stop working:

"Just use transactions" assumes all relevant data lives in a single database supporting ACID transactions. In distributed systems, data spans databases, caches, message queues, and external services.
"Just make everything synchronous" kills availability. If every write must synchronously update all replicas, any replica failure makes the system unavailable for writes.
"Just accept eventual consistency everywhere" violates critical business requirements. Some operations (financial transactions, seat reservations, unique constraint enforcement) require stronger guarantees than "it'll be consistent eventually".

The physics are unforgiving: you cannot have strong consistency, high availability and partition tolerance simultaneously (CAP theorem). You must choose which guarantee to relax, and that choice must be explicit and intentional.

3. Core concept (mental model)

Think of consistency models as contracts about visibility and ordering.

Visibility: When a write completes, which readers can see it, and when?

Ordering: When multiple writes occur, do all observers agree on the order in which they happened?

Imagine a distributed system as a set of nodes, each maintaining a local view of the data. When a write occurs on one node, it must progagate to others. The consistency model define the rules governing that propagation.

Strong consistency: All nodes agree on a single, global order of operations. After a write completes, all subsequent reads (from any node) see that write. This is the easiest model to reason about — it behaves like a single-threaded program — but it's expensive to achieve in distributed systems.
Eventual consistency: Writes propagate asynchronously. Different nodes may temporarily disagree about the current stage. Given enough time without new writes, all replicas converge to the same state. Reads may see stale data. This maximizes availability and performance but complicates application logic.
Causal consistency: Preserves cause-and-effect relationships. If operation B causally depends on operation A (e.g., a reply depends on the original message), all nodes see A before B. Operations without causal relationships may be seen in different orders by different nodes. This provides a middle ground — stronger guarantees than eventual consistency without the cost of strong consistency.

The mental model: you're choosing how synchronized the clocks are across your distributed system. Strong consistency keeps all clocks perfectly aligned (expensive, slow). Eventual consistency allows clocks to drift (cheap, fast, but confusing). Causal consistency synchronizes only the clocks that matter for correctness.

4. How it works (step-by-step)

🔒 Strong consistency implementation (using distributed locks)

Strong consistency enforces a single global order of writes, ensuring that all nodes observe the same state at any given time

Step 1 — Acquire global lock

Before writing, a node acquires a distributed lock, typically implemented using a consensus protocol such as:

Raft
Paxos
A distributed coordination service like ZooKeeper

Why

Ensures only one write occurs at a time across all nodes, creating a total ordering.

Assumptions

Network is reliable enough that lock acquisition completes within acceptable timeouts.

Step 2 — Perform write

With the lock held, the node writes to its local storage and propagates the write to replicas.

Invariant

No other node can write while this lock is held.

Step 3 — Wait for acknowledgment

The writer waits for a quorum (majority) of replicas to acknowledge the write.

Why

Guarantees durability even if some nodes fail.
Any future read that also reaches a quorum will observe this write.

Assumptions

A majority of nodes remain available.

Step 4 — Release lock and return success

Only after quorum acknowledgment does the writer release the lock and confirm success to the client.

Invariant

From this point forward, any quorum-based read will observe the committed value.

🌊 Eventual consistency implementation

Implementation using asynchronous replication

Eventual consistency favors availability and low latency, accepting temporary divergence between replicas.

Step 1 — Write to primary

The client writes to a single node:

A designated primary, or
Any node in leaderless systems (e.g., Cassandra with quorum = 1)

Step 2 — Return success immediately

The node acknowledges success without waiting for replicas.

Why

Minimizes write latency.
The write is durable on at least one node.

Assumption

Clients can tolerate reading stale data.

Step 3 — Asynchronous propagation

The write is propagated to other replicas in the background.

Mechanisms

Change Data Capture (CDC)
Replication logs
Message queues
Gossip protocols

Step 4 — Eventual convergence

Replicas apply writes as they arrive, potentially using:

Timestamps
Vector clocks
Conflict resolution strategies

Invariant

If no new writes occur, all replicas eventually converge to the same state.

Read path

Reads may hit any replica.
Clients can observe stale data if the replica has not yet received recent writes.

🧭 Causal consistency

Implementation using vector clocks

Causal consistency guarantees that causally related operations are observed in the same order by all nodes, while allowing concurrent (unrelated) operations to be observed in different orders.

Step 1 — Track causality with vector clocks

Each node maintains a vector clock, represented as a map of {node_id: counter} pairs. This structure tracks the latest operation observed from each node.

Example

{ node_A: 5, node_B: 3 }

This indicates that the node has observed:

5 operations from node_A
3 operations from node_B

Step 2 — Attach vector clock to writes

When processing a write:

The node increments its own counter in the vector clock.
The full vector clock is attached to the write as metadata.

Why

The vector clock encodes all causally preceding operations.

Step 3 — Propagate writes with causality metadata

Writes are propagated to replicas together with their vector clocks.

This ensures that causal context travels with the data, not just the payload.

Step 4 — Enforce causal ordering on delivery

A replica applies a write only if it has already observed all causally preceding writes.

This is determined by comparing vector clocks.

Mechanism

If a write W carries the vector clock:

{ A: 5, B: 3 }

The replica must have observed:

at least 5 operations from A
at least 3 operations from B

before applying W.

Why

Prevents replicas from observing effects before causes.

Step 5 — Buffer out-of-order writes

Writes that arrive before their causal dependencies are buffered.

They are applied later, once all required causal dependencies have been satisfied.

Read path

Reads may observe different orderings of concurrent (non-causally-related) writes.
Reads will never observe an effect before its cause.

5. Minimal but realistic example (.NET)

Scenario: E-commerce order processing with inventory reservation

System components:

SQL database (order data)
Redis cache (inventory counts)
Azure Service Bus (order events)
Inventory Service (consumes events, updates inventory)
Order Service (creates orders, publishes events)

Eventual consistency (problematic)

// Order Service
public async Task<OrderResult> PlaceOrderAsync(CreateOrderRequest request)
{
    // Read cached inventory count
    var inventoryCount = await _cache.GetAsync<int>($"inventory:{request.ProductId}");
    
    if (inventoryCount < request.Quantity)
        return OrderResult.InsufficientInventory;
    
    // Create order in database
    var order = new Order 
    { 
        ProductId = request.ProductId, 
        Quantity = request.Quantity 
    };
    await _dbContext.Orders.AddAsync(order);
    await _dbContext.SaveChangesAsync();
    
    // Publish event asynchronously
    await _serviceBus.PublishAsync(new OrderPlacedEvent 
    { 
        OrderId = order.Id, 
        ProductId = request.ProductId, 
        Quantity = request.Quantity 
    });
    
    return OrderResult.Success(order.Id);
}

// Inventory Service (separate process)
public async Task HandleOrderPlacedAsync(OrderPlacedEvent evt)
{
    // Update database inventory
    await _dbContext.Database.ExecuteSqlRawAsync(
        "UPDATE Inventory SET Count = Count - {0} WHERE ProductId = {1}",
        evt.Quantity, evt.ProductId);
    
    // Invalidate cache asynchronously
    await _cache.RemoveAsync($"inventory:{evt.ProductId}");
}

Problem: Cache and database are eventually consistent. Race conditions allow overselling:

Order Service reads cached count (10 times)
Two concurrent requests each order 6 items
Both path inventory check (10 >= 6)
Both orders succeed
Inventory Service processes events, decrements to -2
You're sold 12 items but only had 10

Strong consistency (correct but slow)

public async Task<OrderResult> PlaceOrderAsync(CreateOrderRequest request)
{
    using var transaction = await _dbContext.Database.BeginTransactionAsync(
        IsolationLevel.Serializable);
    
    try
    {
        // Read inventory with exclusive lock
        var inventory = await _dbContext.Inventory
            .FromSqlRaw("SELECT * FROM Inventory WITH (UPDLOCK, HOLDLOCK) WHERE ProductId = {0}", 
                        request.ProductId)
            .SingleOrDefaultAsync();
        
        if (inventory.Count < request.Quantity)
            return OrderResult.InsufficientInventory;
        
        // Decrement inventory atomically
        inventory.Count -= request.Quantity;
        
        // Create order
        var order = new Order 
        { 
            ProductId = request.ProductId, 
            Quantity = request.Quantity 
        };
        await _dbContext.Orders.AddAsync(order);
        
        await _dbContext.SaveChangesAsync();
        await transaction.CommitAsync();
        
        // Publish event only after commit (outbox pattern recommended)
        await _serviceBus.PublishAsync(new OrderPlacedEvent 
        { 
            OrderId = order.Id, 
            ProductId = request.ProductId, 
            Quantity = request.Quantity 
        });
        
        // Invalidate cache after successful commit
        await _cache.RemoveAsync($"inventory:{request.ProductId}");
        
        return OrderResult.Success(order.Id);
    }
    catch
    {
        await transaction.RollbackAsync();
        throw;
    }
}

How it maps to the concept:

Serializable isolation provides strong consistency within the database
Exclusive lock (UPDLOCK, HOLDLOCK) prevents concurrent reads/writes to the same inventory row
All operations (check, decrement, order creation) happen atomically
Cache invalidation happens after commit, accepting brief staleness in exchange for correctness
Event publishing follows the write, maintaining causal consistency

Trade-offs: Every order acquisition holds a database lock, reducing throughput and increasing latency. Under high concurrency, lock contention becomes a bottleneck.

Casual consistency (balanced)

// Use versioned inventory with optimistic concurrency
public class Inventory
{
    public Guid ProductId { get; set; }
    public int Count { get; set; }
    public long Version { get; set; } // Incremented on each update
}

public async Task<OrderResult> PlaceOrderAsync(CreateOrderRequest request)
{
    const int maxRetries = 3;
    
    for (int attempt = 0; attempt < maxRetries; attempt++)
    {
        // Read current inventory state
        var inventory = await _dbContext.Inventory
            .AsNoTracking()
            .SingleOrDefaultAsync(i => i.ProductId == request.ProductId);
        
        if (inventory.Count < request.Quantity)
            return OrderResult.InsufficientInventory;
        
        // Create order
        var order = new Order 
        { 
            ProductId = request.ProductId, 
            Quantity = request.Quantity,
            InventoryVersionAtReservation = inventory.Version // Capture version
        };
        await _dbContext.Orders.AddAsync(order);
        
        // Attempt to decrement inventory with version check (compare-and-swap)
        var rowsAffected = await _dbContext.Database.ExecuteSqlRawAsync(@"
            UPDATE Inventory 
            SET Count = Count - {0}, Version = Version + 1 
            WHERE ProductId = {1} AND Version = {2} AND Count >= {0}",
            request.Quantity, request.ProductId, inventory.Version);
        
        if (rowsAffected == 0)
        {
            // Version changed (concurrent modification) or insufficient inventory
            await Task.Delay(TimeSpan.FromMilliseconds(100 * (attempt + 1))); // Exponential backoff
            continue; // Retry
        }
        
        await _dbContext.SaveChangesAsync();
        
        // Publish event with version information
        await _serviceBus.PublishAsync(new OrderPlacedEvent 
        { 
            OrderId = order.Id, 
            ProductId = request.ProductId, 
            Quantity = request.Quantity,
            InventoryVersion = inventory.Version + 1
        });
        
        // Invalidate cache
        await _cache.RemoveAsync($"inventory:{request.ProductId}");
        
        return OrderResult.Success(order.Id);
    }
    
    return OrderResult.ConcurrentModificationFailure;
}

How it maps to causal consistency:

Version number acts as a logic clock tracking the causal history of inventory updates
Compare-and-swap ensures updates only succeed if no intervening write occurred
Retries handle conflicts without blocking other operations
Events carry version information, allowing downstream consumes to detect and handle out-of-order delivery

Trade-off: No locks, higher throughput. Conflicts result in retries instead of waiting. Under exterme contention, retry storms can occur.

6. Design trade-offs

Consistency Model	Latency	Availability	Throughput	Complexity	Use When
Strong	High (100ms-1s+)	Low (requires quorum)	Low (serialized writes)	Low (simple reasoning)	Financial transaction, unique constraints invetory with zero tolerance for overseeling
Sequential	Medium (50-200ms)	Medium	Medium	Medium	Operations must appear in a total order, but don't need real-time visibility
Causal	Low (10-50ms)	High	High	High (vector, clocks dependency tracking)	Social feeds, collaborative editing, messaging (replies must follow posts)
Eventual	Very Low (1-10ms)	Very High (always writable)	Very High	Medium (conflict resolution logic)	Analytics, caching, view counts, non-critical reads
Read-your-writes	Low-Medium	High	High	Medium (session affinity, client tracking)	User profiles, settings (user sees own updates immediately

Alternative approaches and their costs

Two-phase commit (2PC):

Gain: ACID guarantees across distributed resources
Give up: Availability (block if coordinator fails), performance (multiple round trips)
Accept: Operational complexity of coordinator recovery, potential for indefinitite blocking

Saga pattern (compensating transactions):

Gain: Availability (each step commit independently), no distributed locks
Give up: Atomicity (intermediate state visible), potential for inconsistency windows
Accept: Complex rollback logic, idempotency requirements, eventual consistency between saga steps

CRDT (Conflict-free Replicated Data Types):

Gain: Automatic conflict resolution, no coordination needed, high availability
Give up: Limited to specific data structures (counters sets, registers), cannot express arbitrary business logic
Accept: Additional metadata overhead, complex merge semantics

Event sourcing with idempotent consumers:

Gain: Auditability, replayability, natural causal ordering
Give up: Query complexity (event replay cost), storage overhead
Accept: Eventual consistency between write model and read models

Failure mode comparison

Strong consistency failure mode: During network partition or node failures, writes become unavailable until quorum is restored. Reads may also become unavailable depending on configuration. System chooses consistency over availability.

Eventual consistency failure mode: During partition, both sides accept writes independently. When partition heals, conflicts must be resolved (last-write-wins, vector clock merging, or manual internvention). System chooses availability over consistency, accepting temporary divergence.

Causal consistency failure mode: Out-of-order event delivery can cause buffering and delayed processing. If dependencies never arrive (due to bugs or data loss), dependent operations remain blocked indefinitely. Requires careful timeout and dead letter queue handling.

7. Common mistakes and misconceptions

"Eventual consistency means data will be correct eventually"

Why it happens:

The term "eventual" sounds reassuring, implying correctness is just delayed.

Problem it causes:

Eventual consistency guarantees convergence, not correctness. If two replicas independently accept conflicting writes (both decrement inventory to -1) they'll eventual converge to the same wrong value. Business invariants (non-negative inventory) are not preserved.

How to avoid:

Distinguish between convergence and correctness. Use strong consistency or application-level conflict resolution for operations that must preserve invariants.

"Using distributed transactions everywhere"

Why it happens:

Strong consistency is easy to reason about. Developers default to the familiar ACID model.

Problem it causes:

Distributed transactions (2PC, XA) kill availability and performance. Every transaction coordinates across multiple services, each holding locks, creating cascading latency. Under load, timeout-induced rollbacks cause thrashing.

How to avoid:

Reserve distributed transactions for truly atomic cross-service operations (rare). Prefer local transactions within service boundaries, eventual consistency across boundaries, and saga patterns for long-running workflows.

"Read-after-write consistency is free with a cache"

Why it happens:

Developers assume cache invalidation immediately makes fresh data visible.

Problem it causes:

Cache invalidation is asynchronous. Between invalidation and cache refresh, reads may hit stale data still in the cache or may trigger a cache miss that fetches from a lagging replica.

How to avoid:

Implement read-your-writes using session affinity (sticky sessions), version headers (client sends last seen version, server waits until caught up), or explicitly routing reads to primary after writes.

"Timestamps can establish global ordering"

Why it happens:

System clocks seem like a simple way to order events across nodes.

Problem it causes:

Clocks drift and can be adjusted backward (NTP corrections). Two events on different nodes with timestamps T1 < T2 do not guarantee E1 happened before E2. Relying on timestamps for ordering causes silent data corruption (later write appears earlier, gets overwritten)

How to avoid:

Use logical clocks (Lamport clocks, vector clocks) or hybrid logical clocks (HLCs) that combine physical time with logical counters. For strong ordering, use a centralized sequencer (single source of ordering) or consensus protocols.

"Retries solve consistency problems"

Why it happens:

Retry logic is a standard reliability pattern.

Problem it causes:

Non-idempotent operations (create order, charge payment) executed multiple times create duplicates. Retries without proper deduplication cause double-charging, duplicate inventory reservations, and violated uniqueness constraints.

How to avoid:

Make all operations idempotent using client-provided idempotency keys, store request IDs to detect duplicates, or use exactly-once delivery guarantees (complex and expensive).

"Causal consistency is just eventual consistency with ordering"

Why it happens:

Both allow temporary divergence, so they seem similar.

Problem it causes:

Causal consistency has stricter guarantees and higher implementation cost. Treating it as eventual consistency with ordering underestimates the complexity of tracking dependencies (vector clocks, dependency graphs) and buffering out-of-order messages.

How to avoid:

Understand that causal consistency requires explicit dependency tracking across all operations. Only use it when cause-effect violations would break correctness (e.g., seeing a reply before the original message).

"The database handles all consistency"

Why it happens:

Relational databases provide ACID transactions, leading to a false sense of safety.

Problem it causes:

Consistency guarantees don't extend beyond database boundaries. A read from SQL followed by a read from Redis is not consistent. Publishing an event to Service Bus after a database commit is not atomic. The database protects its own internal consistency but not your application's distributed state.

How to avoid:

Recognize that distributed systems span multiple storage systems, caches, and message queues. Map out where data lives and explicitly design consistency across these boundaries using patterns like outbox, sagas, or distributed transactions.

8. Operational and production considerations

What to monitor

Replication lag: Time delta between primary write and replica visibility. Track per replica, per table, per region.

Why: Indicates how far behind replicas are running. Spikes indicate network issues, replica overload, or large batch writes.
Alert threshold: Depends on consistency model. Eventual consistency can tolerate minutes; read-your-writes should alert at >1 second.

Confict rate: Number of write conflicts detected (optimistic concurrency failures, CRDT merges, saga compensations).

Why: High conflict rates indicate hot keys, incorrect partitioning, or excessive concurrent writes.
What to track: Conflicts per second, conflict resolution latency, automatic vs. manual resolutions.

Transaction abort rate: Percentage of transactions that abort due to serialization failures or deadlocks.

Why: High abort rates under strong consistency indicate lock contention. System is spending resources on failed work.

Staleness duration: Distribution of time between a write completing and that write becoming visible to a given reader.

Why: Quantifies the practical impact of eventual consistency. 99th percentile staleness matterns more than average.

Event processing lag: Delta between event publication time and processing completion time.

Why: Indicates backpressure in asynchronous pipelines. Lag compounds across dependent services.

Cache hit rate and invalidation lag: Cache effectiveness and time between database write and cache invalidation completion.

Why: Low hit rate negate caching benefits. High invalidation lag increase staleness window.

What degrades first:

Under write-heavy load:

Strong consistency: Lock contention causes transaction timeouts and abort rate spikes. Throughput collapses.
Eventual consistency: Replication lag increases as replicas struggle to keep up. Staleness duration grows.

Under read-heavy load:

Strong consistency: Quorum reads overwhelm replicas. Read latency increases.
Eventual consistency: System scales horizontally with read replicas, handling load gracefully.

During network partitions:

Strong consistency (CP system): Minority partition becomes unavailable for writes and quorum reads.
Eventual consistency (AP system): Both partitions remain writable, accumulating divergence that must be reconciled.

During replica failure:

Strong consistency: If quorum cannot be formed (majority of replica down), writes fail. System becomes read-only or fully unavailable.
Eventual consistency: Remaining replicas continue serving reads and writes. Failed replica catches up when restored.

What becomes expensive

Storage: Conflict resolution metadata (vector clocks, version histories, CRDT tombstones) accumulates. Event sourcing store event state change. Multi-version concurrency control (MVCC) maintains old versions.

Compute: Conflict detection (vector clock comparisons), CRDT merge operations, and buffering out-of-order events consume CPU and memory. Saga compensation logic executes on failures.

Network: Synchronous replication for strong consistency multiplies network calls. Quorum reads query multiple replicas. Causal consistency broadcasts dependency metadata.

Latency: Strong consistency adds RTT (round-trip time) for quorum coordination. Cross-region replication adds hundreds of milliseconds. Lock acquisition serializes operations.

Observability signals

Correctness violations:

Inventory going negative (invariant broken)
Duplicate processing of idempotent operations (retry without deduplication)
Out-of-order message delivery causing causal violation (reply before original post)

Performance degration:

P99 latency increase on write operations (lock contention)
Sudden spike in transaction retry rate (optimistic concurrency conflict storm)
Event consumer lag growing monotonically (backpressure)

Consistency anomalies:

Users reporting stale data (cache serving old values)
Reports showing data inconsistencies across services (eventual consistency divergence)
Failed assertion in testing (race conditions, non-repeatable reads)

Operational risks

Data loss during failover: If primary fails before replicating writes to secondaries, those writes are lost. Strong consistency with quorum writes mitigates this; asynchronous replication accepts this risk.
Split-brain scenarios: Network partition creates two primaries, both accepting writes. Reconciliation is complex or impossible without application-specific conflict resolution.
Thundering herd on cache invalidation: Invalidating a hot key causes all subsequent reads to miss cache simultaneously, overwhelming the database.
Cascading failures in sagas: If one step in a multi-step saga fails after several steps have committed, compensating transactions must execute in reverse order. If compensation fails, manual intervention is required.
Unbounded replication lag: Slow consumers or large batch operations can cause replicas to fall hours or days behind, making eventual consistency impractical for reads requiring fresh data.

9. When NOT to use this

Don't use strong consistency when:

Your domain tolerantes brief inconsistency: View counts, "likes", analytics dashboards, and approximations don't require real-time accurancy. Strong consistency adds cost for zero business value.
Availability matterns more than correctness: If your SLA promises 99.99% uptime, strong consistency's requirement for quorum makes that SLA unachievable during partitions or multi-region deployments.
You're operation at high scale: Serializable transactions don't scale horizontally. Systems handling millions of writes per second (ad serving, IoT telemetry, clickstream analytics) cannot afford coordination overhead.
Latency is critical: Real-time gaming, low-latency trading platforms, and streaming services cannot accept the 100ms+ penalty of distributed coordination.

Don't use eventual consistency when:

Business invariants must never be violated: Overselling invetory, double-charging payments, or allowing conflicting reservation is unacceptable. The risk of covergence to an incorrect state is too high.
User expect immediate read-your-writes: Profile updates, settings changes , and user-generated content that the user immediately queries must be visible. The cognitive dissonance of saving a setting and seeing it unchanged on refresh destroys trust.
You lack conflict resolution logic: Eventual consistency requires explicit handling of conflicts when concurrent writes occur. If you don't have deterministic merge logic (CRDTs, last-write-wins, manual resolution), you'll end up with silent data corruption.
Regulatory requirements demand auditability: Financial systems, healthcare records, and legal documents require provable ordering and state. Eventual consistency's ambiguity about which write "won" violates compliance.

Don't use casual consistency when:

Operations have no causal relationships: Independent events (unrelated user actions, sensor readings from different devices) don't benefit from casual ordering. The overhead of vector clocks and dependency tracking is wasted.
You can't track dependencies: Casual consistency requires instrumenting your entire system to capture happens-before relationships. If you're integrating with third-party services or legacy systems that don't emit causality metadata, you can't enforce casual ordering.
Strong consistency is feasible: If your scale and latency requirements allow strong consistency, it's simpler and provides stronger guarantees. Casual consistency's complexity only makes sense when strong consistency is too expensive.

Don't use distributed transaction when:

Operations span minutes or hours: Long-running transactions hold locks, blocking other operations. Saga patterns with compensating transactions are more appropriate for multi-step workflows that can't complete in seconds.
Coordinator is a single point of failure: Two-phase commit requires a coordinator. If the coordinator crashes mid-transaction, participants are left in uncertain states requiring manual intervention.
Cross-organizational boundaries: You can't enfornce distributed transaction protocols across services you don't control. External APIs don't participate in your 2PC protocol.

10. Key takeaways

Consistency models are not features you enable; they constraints you enforce. Every architectural decision — caching, replication, async messaging — introduces consistency trade-offs. Design these explicitly.
Strong consistency and high availability are mutually exclusive under network partitions. Choose one. If you need both, partition your data so critical invariants use strong consistency while non-critical data uses eventual consistency.
Eventual consistency guarantees convergence, not correctness. If two replicas independently violate an invariant, they'll eventually agree on the wrong value. Protect invariants with strong consistency or application-level validation.
Timestamps from different machines cannot establish global ordering. Clocks drift. Use logical clocks (Lamport, vector) or centralized sequencers. Physical timestamps are useful for approximations and debugging, not correctness.
Retries without idempotency create duplicates. Every operation that can be retried must be idempotent. Use client-provided idempotency keys, deduplicate on request IDs, or accept exactly-once delivery's operational complexity.
Consistency guarantees don't cross system boundaries automatically. A read from SQL followed by a read from Redis is not consistent. A database transaction followed by an event publish is not atomic. Use outbox patterns sagas, or distributed transactions to extend guarantees.
Monitoring replication lang, conflict rate, and staleness is not optional. These metrics are your early warning system for consistency violations. Alert on thresholds before users report data corruption.

11. High-Level Overview

Visual representation of consistency models in distributed systems, highlighting how write operations propagate across replicas, how ordering and visibility guarantees differ, and where coordination, buffering, and convergence occur under each model.

Scroll to zoom • Drag to pan

1. What this document is about​

2. Why this matters in real systems​

Common triggers:​

What breaks when you ignore consistency models:​

Why simpler approaches stop working:​

3. Core concept (mental model)​

4. How it works (step-by-step)​

🔒 Strong consistency implementation (using distributed locks)​

Step 1 — Acquire global lock​

Step 2 — Perform write​

Step 3 — Wait for acknowledgment​

Step 4 — Release lock and return success​

🌊 Eventual consistency implementation​

Implementation using asynchronous replication​

Step 1 — Write to primary​

Step 2 — Return success immediately​

Step 3 — Asynchronous propagation​

Step 4 — Eventual convergence​

🧭 Causal consistency​

Implementation using vector clocks​

Step 1 — Track causality with vector clocks​

Step 2 — Attach vector clock to writes​

Step 3 — Propagate writes with causality metadata​

Step 4 — Enforce causal ordering on delivery​

Why​

Step 5 — Buffer out-of-order writes​

Read path​

5. Minimal but realistic example (.NET)​

Scenario: E-commerce order processing with inventory reservation​

System components:​

Eventual consistency (problematic)​

Strong consistency (correct but slow)​

How it maps to the concept:​

Casual consistency (balanced)​

How it maps to causal consistency:​

6. Design trade-offs​

Alternative approaches and their costs​

Two-phase commit (2PC):​

Saga pattern (compensating transactions):​

CRDT (Conflict-free Replicated Data Types):​

Event sourcing with idempotent consumers:​

Failure mode comparison​

7. Common mistakes and misconceptions​

"Eventual consistency means data will be correct eventually"​

"Using distributed transactions everywhere"​

"Read-after-write consistency is free with a cache"​

"Timestamps can establish global ordering"​

"Retries solve consistency problems"​

"Causal consistency is just eventual consistency with ordering"​

"The database handles all consistency"​

8. Operational and production considerations​

What to monitor​

What degrades first:​

Under write-heavy load:​

Under read-heavy load:​

During network partitions:​

During replica failure:​

What becomes expensive​

Observability signals​

Correctness violations:​

Performance degration:​

Consistency anomalies:​

Operational risks​

9. When NOT to use this​

Don't use strong consistency when:​

Don't use eventual consistency when:​

Don't use casual consistency when:​

Don't use distributed transaction when:​

10. Key takeaways​

11. High-Level Overview​

1. What this document is about

2. Why this matters in real systems

Common triggers:

What breaks when you ignore consistency models:

Why simpler approaches stop working:

3. Core concept (mental model)

4. How it works (step-by-step)

🔒 Strong consistency implementation (using distributed locks)

Step 1 — Acquire global lock

Step 2 — Perform write

Step 3 — Wait for acknowledgment

Step 4 — Release lock and return success

🌊 Eventual consistency implementation

Implementation using asynchronous replication

Step 1 — Write to primary

Step 2 — Return success immediately

Step 3 — Asynchronous propagation

Step 4 — Eventual convergence

🧭 Causal consistency

Implementation using vector clocks

Step 1 — Track causality with vector clocks

Step 2 — Attach vector clock to writes

Step 3 — Propagate writes with causality metadata

Step 4 — Enforce causal ordering on delivery

Why

Step 5 — Buffer out-of-order writes

Read path

5. Minimal but realistic example (.NET)

Scenario: E-commerce order processing with inventory reservation

System components:

Eventual consistency (problematic)

Strong consistency (correct but slow)

How it maps to the concept:

Casual consistency (balanced)

How it maps to causal consistency:

6. Design trade-offs

Alternative approaches and their costs

Two-phase commit (2PC):

Saga pattern (compensating transactions):

CRDT (Conflict-free Replicated Data Types):

Event sourcing with idempotent consumers:

Failure mode comparison

7. Common mistakes and misconceptions

"Eventual consistency means data will be correct eventually"

"Using distributed transactions everywhere"

"Read-after-write consistency is free with a cache"

"Timestamps can establish global ordering"

"Retries solve consistency problems"

"Causal consistency is just eventual consistency with ordering"

"The database handles all consistency"

8. Operational and production considerations

What to monitor

What degrades first:

Under write-heavy load:

Under read-heavy load:

During network partitions:

During replica failure:

What becomes expensive

Observability signals

Correctness violations:

Performance degration:

Consistency anomalies:

Operational risks

9. When NOT to use this

Don't use strong consistency when:

Don't use eventual consistency when:

Don't use casual consistency when:

Don't use distributed transaction when:

10. Key takeaways

11. High-Level Overview