Current Implementation

What Exists Today

We’ve implemented four core components of an LSM-tree storage engine, but they’re not yet integrated into a working system.

Implemented Components

1. Write-Ahead Log (WAL) ✅ ENHANCED

We have a comprehensive WAL implementation with industry-standard features for educational purposes.

File Headers - 64-byte headers with magic numbers and versioning
Binary format with little-endian encoding
CRC32 checksums for corruption detection
Zero-copy reads with BytesMutExt (23-33% faster)
Metrics collection - Operations, success rates, sync times
Configurable sync modes (None, Normal, Full)
Size-based rotation (returns error when limit reached)
Comprehensive testing - 153 tests covering all edge cases

// Actual format we use
pub struct WALEntry {
    pub key: Vec<u8>,
    pub value: Option<Vec<u8>>,  // None for deletes
    pub timestamp: u64,
    pub operation: Operation,
}

// Binary layout: [length:4][crc:4][data:varies]

ferrisdb-storage/src/wal/writer.rs - Writes entries
ferrisdb-storage/src/wal/reader.rs - Reads and recovers
ferrisdb-storage/src/wal/log_entry.rs - Entry format

2. MemTable (Skip List) ✅

We have a concurrent skip list that serves as our in-memory write buffer.

Features Implemented

Lock-free reads using crossbeam epoch - Fine-grained locking for writes - MVCC with timestamp ordering - Max height of 12 levels - ~450 lines of tested code

Performance Characteristics

O(log n) insert/lookup - Concurrent reads don’t block - Keys ordered by (user_key, timestamp DESC) - No size limit enforcement yet

3. SSTable Writer ✅

We can write sorted data to disk in SSTable format.

Our SSTable Format:
┌─────────────┐
│ Data Blocks │ - Series of sorted entries
├─────────────┤
│ Index Block │ - First key of each data block
├─────────────┤
│ Bloom Filter│ - Placeholder (just zeros)
├─────────────┤
│   Footer    │ - Offsets and checksums
└─────────────┘

4. SSTable Reader ✅

We have a fully functional SSTable reader that can efficiently query the files written by our writer.

Features Implemented

Binary search for exact key lookups - Latest version retrieval with MVCC - Block caching for performance - Footer and index parsing - ~400 lines of tested code

Performance Characteristics

O(log n) key lookups using binary search - Cached blocks avoid disk I/O - Efficient range scans - Proper error handling

What’s NOT Implemented

Missing Core Features

Storage Engine

Components not integrated - No flush from MemTable to SSTable - No read path implementation - No manifest/version tracking

Compaction

No background threads - No merging of SSTables - No garbage collection - No level management

Missing Infrastructure

No Server - Can’t accept network connections
No Client - Can’t connect to a database
No API - No way to get/put data
No Configuration - Hardcoded values only

Code Statistics

# Current implementation (as of Day 5)
Total Rust code: 11,306 lines
Total tests: 217

# Component breakdown (including tests)
WAL:            ~2,500 lines (with headers, metrics, BytesMutExt)
MemTable:       ~450 lines
SSTable Writer: ~650 lines
SSTable Reader: ~400 lines
Core Types:     ~200 lines
Tutorials:      ~2,000 lines
Benchmarks:     ~500 lines

You Can Run

# Clone the repo
git clone https://github.com/ferrisdb/ferrisdb
cd ferrisdb

# Run component tests
cargo test -p ferrisdb-storage

# Run benchmarks
cargo bench -p ferrisdb-storage

# See the metrics in action
cargo test wal_metrics -- --nocapture

# Specific component tests
cargo test wal::    # WAL tests
cargo test memtable # MemTable tests
cargo test sstable  # SSTable tests

You Cannot Run

# These don't exist:
ferrisdb-server           # No server binary
ferrisdb-client connect   # No client
ferrisdb bench            # No benchmarks
ferrisdb --config         # No configuration

Current Architecture Reality

Instead of the distributed system in our vision, here’s what actually exists:

What We Have:
┌─────────────────────────────────────────┐
│          Individual Components          │
│  ┌─────┐ ┌────────┐ ┌───────────────┐   │
│  │ WAL │ │MemTable│ │SSTable W/R    │   │
│  └─────┘ └────────┘ └───────────────┘   │
│      (not connected together yet)       │
└─────────────────────────────────────────┘

What We Don't Have:
- Storage Engine connecting them
- Network layer
- Transaction system
- Distribution/replication
- Query processing

Honest Assessment

What’s Good

Clean, well-tested component implementations
Good foundation for learning
Real concurrent data structures
Proper error handling

What’s Missing

Components aren’t integrated
Can’t actually store/retrieve data
No persistence beyond WAL
Long way from a “database”

For Learners

This is perfect if you want to:

Understand how databases work internally
See real concurrent Rust code
Follow a database being built from scratch
Learn alongside us

This is not for you if you need:

A working database
Production storage
Benchmarkable performance
Distributed features

Next Steps

Watch our journey as we:

Connect components into a storage engine
Implement basic get/put operations
Add a simple get/put API
Build from there!

Follow our blog to see each step of the process.

Last updated: January 31, 2025