WAL Core Principle
âWrite changes to a log file BEFORE updating the actual dataâ
How Write-Ahead Logs ensure data durability and enable crash recovery in database systems.
Difficulty: đ Beginner ⢠Reading time: 15 minutes
Imagine youâre running an e-commerce website. A customer completes their order, sees the âOrder Successful!â message, and then⌠your server crashes. When it restarts, is their order still there? Or did it vanish into the digital void?
This is the fundamental durability problem that every database must solve. Without proper crash recovery, you risk:
The problem is that computers use two types of storage:
If your database only writes to memory for speed, all data is lost on crash. If it writes to disk for every operation, it becomes painfully slow. Write-Ahead Logging (WAL) solves this dilemma elegantly.
Write-Ahead Logging follows a simple principle:
WAL Core Principle
âWrite changes to a log file BEFORE updating the actual dataâ
Think of it like a restaurantâs order system:
Even if the kitchen catches fire (system crash), the written orders survive, and a new cook can continue where the previous one left off.
User Request â WAL (Disk) â MemTable (RAM) â SSTable (Disk) â â â "Success" Durability Fast Access Long-term Storage
Letâs understand WAL through a familiar example - ATM withdrawals:
1. Customer withdraws $1002. Update balance in memory: $1000 â $9003. Power failure! đĽ4. Server restarts5. Balance shows $1000 (memory lost)6. Customer has cash but account shows no withdrawal!
1. Customer withdraws $1002. Write to WAL: "Account 123: -$100 at 2:30 PM"3. Update balance in memory: $1000 â $9004. Power failure! đĽ5. Server restarts6. Read WAL: "Oh, there was a $100 withdrawal"7. Replay transaction: $1000 â $9008. Balance correctly shows $900
Letâs see how FerrisDB implements WAL:
pub struct LogEntry { pub sequence_number: u64, // Unique ID for ordering pub key: Vec<u8>, // What was changed pub value: Option<Vec<u8>>, // New value (None = delete) pub timestamp: u64, // When it happened}
// Simplified from FerrisDB codepub fn append(&mut self, entry: &LogEntry) -> Result<()> { // 1. Serialize the entry let serialized = bincode::serialize(entry)?;
// 2. Write length prefix (so we know where entries start/end) self.writer.write_u32(serialized.len() as u32)?;
// 3. Write the actual data self.writer.write_all(&serialized)?;
// 4. Force to disk (fsync) - THIS IS CRITICAL! self.writer.sync_all()?;
Ok(())}
// How FerrisDB recovers after a crashpub fn recover(&mut self) -> Result<Vec<LogEntry>> { let mut entries = Vec::new();
loop { // Read entry length let len = match self.reader.read_u32() { Ok(len) => len, Err(_) => break, // End of log };
// Read entry data let mut buffer = vec![0; len as usize]; self.reader.read_exact(&mut buffer)?;
// Deserialize and collect let entry: LogEntry = bincode::deserialize(&buffer)?; entries.push(entry); }
// Replay all entries to reconstruct state for entry in &entries { self.apply_to_memtable(entry)?; }
Ok(entries)}
Write Performance
Recovery Speed
Instead of syncing after every write, batch multiple writes:
// Inefficient: sync per writefor entry in entries { wal.append(&entry)?; wal.sync()?; // Slow!}
// Efficient: group commitfor entry in entries { wal.append(&entry)?;}wal.sync()?; // One sync for all!
Periodically save a snapshot to avoid replaying entire history:
WAL: [Entry1][Entry2][Entry3][CHECKPOINT][Entry4][Entry5] â Recovery starts here, not at Entry1
Reuse old log files to avoid filesystem overhead:
wal.000001.log (full) â Archive or deletewal.000002.log (active) â Current writeswal.000003.log (preallocated) â Ready for next rotation
[wal]sync_mode = "always" # always | periodic | neversync_interval_ms = 100 # If periodicmax_file_size_mb = 128 # When to rotatecompression = "none" # none | snappy | zstd
Quick Quiz
Try implementing a simple WAL:
struct SimpleWAL { file: File, entries: Vec<LogEntry>,}
impl SimpleWAL { fn append(&mut self, key: &str, value: &str) -> Result<()> { // Your code here: // 1. Create LogEntry // 2. Serialize to bytes // 3. Write to file // 4. Sync to disk }
fn recover(&mut self) -> Result<()> { // Your code here: // 1. Read entries from file // 2. Deserialize each one // 3. Apply to state }}
Exercise 3: Test crash recovery
# Start a write workloadcargo run --example wal_stress_test &
# Kill it mid-write (simulating crash)sleep 5 && kill -9 $!
# Run recoverycargo run --example wal_recovery -- --recover-from test.wal
Key metrics to watch:
fsync()
callsDebugging techniques:
cargo run --bin wal-dump
to examine entriesHow other databases handle WAL:
Database | WAL Implementation | Key Features |
---|---|---|
PostgreSQL | WAL with configurable sync | Full ACID compliance |
MySQL (InnoDB) | Redo log | Group commit optimization |
SQLite | Journal or WAL mode | Simpler for embedded use |
Redis | AOF (Append Only File) | Optional durability |
Operational concerns:
ferrisdb-storage/src/wal/writer.rs
- Core write logicferrisdb-storage/src/wal/reader.rs
- Recovery implementationferrisdb-storage/src/wal/log_entry.rs
- Entry encodingferrisdb-storage/src/wal/
- Test cases showing usagePart of the FerrisDB Learning Journey. Built with â¤ď¸ by a human and an AI.