Skip to content

Current Implementation

We’ve implemented four core components of an LSM-tree storage engine, but they’re not yet integrated into a working system.

1. Write-Ahead Log (WAL) ✅ ENHANCED

Section titled “1. Write-Ahead Log (WAL) ✅ ”

We have a comprehensive WAL implementation with industry-standard features for educational purposes.

  • File Headers - 64-byte headers with magic numbers and versioning
  • Binary format with little-endian encoding
  • CRC32 checksums for corruption detection
  • Zero-copy reads with BytesMutExt (23-33% faster)
  • Metrics collection - Operations, success rates, sync times
  • Configurable sync modes (None, Normal, Full)
  • Size-based rotation (returns error when limit reached)
  • Comprehensive testing - 153 tests covering all edge cases

We have a concurrent skip list that serves as our in-memory write buffer.

Features Implemented

  • Lock-free reads using crossbeam epoch - Fine-grained locking for writes - MVCC with timestamp ordering - Max height of 12 levels - ~450 lines of tested code

Performance Characteristics

  • O(log n) insert/lookup - Concurrent reads don’t block - Keys ordered by (user_key, timestamp DESC) - No size limit enforcement yet

We can write sorted data to disk in SSTable format.

Our SSTable Format:
┌─────────────┐
│ Data Blocks │ - Series of sorted entries
├─────────────┤
│ Index Block │ - First key of each data block
├─────────────┤
│ Bloom Filter│ - Placeholder (just zeros)
├─────────────┤
│ Footer │ - Offsets and checksums
└─────────────┘

We have a fully functional SSTable reader that can efficiently query the files written by our writer.

Features Implemented

  • Binary search for exact key lookups - Latest version retrieval with MVCC - Block caching for performance - Footer and index parsing - ~400 lines of tested code

Performance Characteristics

  • O(log n) key lookups using binary search - Cached blocks avoid disk I/O - Efficient range scans - Proper error handling

Storage Engine

  • Components not integrated - No flush from MemTable to SSTable - No read path implementation - No manifest/version tracking

Compaction

  • No background threads - No merging of SSTables - No garbage collection - No level management
  • No Server - Can’t accept network connections
  • No Client - Can’t connect to a database
  • No API - No way to get/put data
  • No Configuration - Hardcoded values only
Terminal window
# Current implementation (as of Day 5)
Total Rust code: 11,306 lines
Total tests: 217
# Component breakdown (including tests)
WAL: ~2,500 lines (with headers, metrics, BytesMutExt)
MemTable: ~450 lines
SSTable Writer: ~650 lines
SSTable Reader: ~400 lines
Core Types: ~200 lines
Tutorials: ~2,000 lines
Benchmarks: ~500 lines
Terminal window
# Clone the repo
git clone https://github.com/ferrisdb/ferrisdb
cd ferrisdb
# Run component tests
cargo test -p ferrisdb-storage
# Run benchmarks
cargo bench -p ferrisdb-storage
# See the metrics in action
cargo test wal_metrics -- --nocapture
# Specific component tests
cargo test wal:: # WAL tests
cargo test memtable # MemTable tests
cargo test sstable # SSTable tests
Terminal window
# These don't exist:
ferrisdb-server # No server binary
ferrisdb-client connect # No client
ferrisdb bench # No benchmarks
ferrisdb --config # No configuration

Instead of the distributed system in our vision, here’s what actually exists:

What We Have:
┌─────────────────────────────────────────┐
│ Individual Components │
│ ┌─────┐ ┌────────┐ ┌───────────────┐ │
│ │ WAL │ │MemTable│ │SSTable W/R │ │
│ └─────┘ └────────┘ └───────────────┘ │
│ (not connected together yet) │
└─────────────────────────────────────────┘
What We Don't Have:
- Storage Engine connecting them
- Network layer
- Transaction system
- Distribution/replication
- Query processing
  • Clean, well-tested component implementations
  • Good foundation for learning
  • Real concurrent data structures
  • Proper error handling
  • Components aren’t integrated
  • Can’t actually store/retrieve data
  • No persistence beyond WAL
  • Long way from a “database”

This is perfect if you want to:

  • Understand how databases work internally
  • See real concurrent Rust code
  • Follow a database being built from scratch
  • Learn alongside us

This is not for you if you need:

  • A working database
  • Production storage
  • Benchmarkable performance
  • Distributed features

Watch our journey as we:

  1. Connect components into a storage engine
  2. Implement basic get/put operations
  3. Add a simple get/put API
  4. Build from there!

Follow our blog to see each step of the process.


Last updated: January 31, 2025