Why This Works
- No locks needed - just atomic reads - Multiple threads can search simultaneously - Progress guaranteed - no thread blocks another
Understanding concurrent programming through FerrisDB’s MemTable implementation.
Difficulty: 📕 Advanced • Reading time: 35 minutes
Imagine your web application suddenly goes viral. Thousands of users are simultaneously creating accounts, updating profiles, and posting content. Your database needs to handle all these writes at the same time without making users wait in line.
This is the concurrency challenge every modern database faces. In traditional systems, when multiple users try to write data:
The traditional solution uses locks (mutexes) to ensure only one thread modifies data at a time:
// Naive approach - everyone waits in linestruct SlowDatabase { data: Mutex<HashMap<String, String>>,}
// Every operation locks EVERYTHINGdb.data.lock().unwrap().insert(key, value); // All other threads wait!
This is like having a single cashier at a busy store - no matter how many customers arrive, they all wait in one line. FerrisDB’s lock-free skip list solves this by allowing multiple “cashiers” to work simultaneously.
Skip lists are like a subway system for your data:
Regular linked list (local train):
Station1 → Station2 → Station3 → Station4 → Station5
Skip list (express system):
Express: Station1 -----------→ Station3 -----------→ Station5Local: Station1 → Station2 → Station3 → Station4 → Station5
To find Station4:
This makes finding data much faster - O(log n) instead of O(n).
Level 3: HEAD ------------------> 30 -------------------------> NULLLevel 2: HEAD ------> 10 -------> 30 -------> 50 -------------> NULLLevel 1: HEAD -> 5 -> 10 -> 20 -> 30 -> 40 -> 50 -> 60 -------> NULLLevel 0: HEAD -> 5 -> 10 -> 20 -> 30 -> 40 -> 50 -> 60 -> 70 -> NULL ↑ ↑ ↑ Start here Found quickly! Without skipping
Key principles:
Let’s examine FerrisDB’s concurrent skip list implementation:
pub struct SkipList { /// The head node of the skip list (sentinel) head: Arc<Node>, /// Maximum height of any node in the list max_height: AtomicUsize, /// Thread-safe random number generator rng: Mutex<StdRng>,}
struct Node { /// The actual key-value data key: Option<Vec<u8>>, value: Option<Vec<u8>>, /// Forward pointers for each level next: Vec<Atomic<Node>>,}
The beauty of skip lists is that searches never need locks:
pub fn get(&self, key: &[u8]) -> Option<Vec<u8>> { let mut current = self.head.clone(); let mut level = self.max_height.load(Ordering::Relaxed);
// Start from highest level and work down while level > 0 { level -= 1;
// Move forward on current level loop { let next = current.next[level].load(Ordering::Acquire);
match next { Some(next_node) if next_node.key < key => { current = next_node; } _ => break, // Go down a level } } }
// Check if we found the key let next = current.next[0].load(Ordering::Acquire); match next { Some(node) if node.key == key => Some(node.value.clone()), _ => None, }}
Why This Works
Performance Impact
Insertion is more complex but still achieves excellent concurrency:
pub fn insert(&self, key: Vec<u8>, value: Vec<u8>) { let height = self.random_height(); let mut update: Vec<Arc<Node>> = vec![self.head.clone(); height];
// Phase 1: Find insertion position at each level let mut current = self.head.clone(); for level in (0..height).rev() { loop { let next = current.next[level].load(Ordering::Acquire);
match next { Some(next_node) if next_node.key < key => { current = next_node; } _ => { update[level] = current.clone(); break; } } } }
// Phase 2: Create new node let new_node = Arc::new(Node { key: Some(key), value: Some(value), next: vec![Atomic::null(); height], });
// Phase 3: Link in from bottom to top for level in 0..height { loop { let next = update[level].next[level].load(Ordering::Acquire); new_node.next[level].store(next.clone(), Ordering::Release);
// Compare-and-swap to atomically insert match update[level].next[level].compare_exchange( next, Some(new_node.clone()), Ordering::Release, Ordering::Acquire, ) { Ok(_) => break, // Success! Err(_) => { // Another thread modified - retry continue; } } } }}
Understanding memory ordering is crucial for lock-free programming:
Relaxed: No synchronization, just atomicity
height.load(Ordering::Relaxed) // Just need the value
Acquire: All subsequent reads see writes before the Release
next.load(Ordering::Acquire) // Need to see node data
Release: All previous writes visible to Acquire loads
next.store(new_node, Ordering::Release) // Publish our writes
AcqRel: Both Acquire and Release semantics
compare_exchange(..., Ordering::AcqRel, ...) // Full synchronization
Operation | Average Case | Worst Case | Space |
---|---|---|---|
Search | O(log n) | O(n) | - |
Insert | O(log n) | O(n) | O(log n) |
Delete | O(log n) | O(n) | - |
Range Query | O(log n + k) | O(n) | - |
Single-threaded
Multi-threaded (8 cores)
// From FerrisDB benchmarkstest bench_skiplist_insert_single ... bench: 180 ns/itertest bench_skiplist_insert_parallel ... bench: 45 ns/iter (4 threads)test bench_btreemap_insert_locked ... bench: 220 ns/itertest bench_btreemap_insert_parallel ... bench: 850 ns/iter (4 threads)
The ABA problem occurs in lock-free programming when:
Solution: Hazard pointers or epoch-based reclamation
// FerrisDB uses epoch-based memory reclamationstruct EpochGuard { epoch: Arc<AtomicU64>,}
impl Drop for EpochGuard { fn drop(&mut self) { // Signal we're done with this epoch self.epoch.fetch_add(1, Ordering::Release); }}
Lock-free structures can’t immediately free memory:
// Problem: When can we safely delete this node?let old = node.next.swap(new_next, Ordering::AcqRel);// drop(old); // UNSAFE! Another thread might be reading it
// Solution: Defer deletiongarbage_collector.defer_drop(old);
#[test]fn test_concurrent_insert() { let list = Arc::new(SkipList::new()); let mut handles = vec![];
// Spawn multiple threads for i in 0..10 { let list_clone = list.clone(); handles.push(thread::spawn(move || { for j in 0..1000 { list_clone.insert( format!("key_{}_{}", i, j).into_bytes(), format!("value_{}_{}", i, j).into_bytes(), ); } })); }
// Wait for all threads for handle in handles { handle.join().unwrap(); }
// Verify all insertions succeeded assert_eq!(list.len(), 10_000);}
#[cfg(loom)]#[test]fn loom_concurrent_test() { loom::model(|| { let list = Arc::new(SkipList::new());
// Loom will explore all possible interleavings let t1 = loom::thread::spawn({ let list = list.clone(); move || list.insert(b"key1", b"value1") });
let t2 = loom::thread::spawn({ let list = list.clone(); move || list.insert(b"key2", b"value2") });
t1.join().unwrap(); t2.join().unwrap();
assert_eq!(list.len(), 2); });}
Perfect For
Avoid When
Try implementing a simple concurrent counter:
struct ConcurrentCounter { count: AtomicU64,}
impl ConcurrentCounter { fn increment(&self) -> u64 { // Your code here: // 1. Load current value // 2. Add 1 // 3. Compare-and-swap // 4. Retry if failed }}
Part of the FerrisDB Learning Journey. Built with ❤️ by a human and an AI.