Skip to content

Lock-Free Skip Lists

Understanding concurrent programming through FerrisDB’s MemTable implementation.

Difficulty: 📕 Advanced • Reading time: 35 minutes

Imagine your web application suddenly goes viral. Thousands of users are simultaneously creating accounts, updating profiles, and posting content. Your database needs to handle all these writes at the same time without making users wait in line.

This is the concurrency challenge every modern database faces. In traditional systems, when multiple users try to write data:

The traditional solution uses locks (mutexes) to ensure only one thread modifies data at a time:

// Naive approach - everyone waits in line
struct SlowDatabase {
data: Mutex<HashMap<String, String>>,
}
// Every operation locks EVERYTHING
db.data.lock().unwrap().insert(key, value); // All other threads wait!

This is like having a single cashier at a busy store - no matter how many customers arrive, they all wait in one line. FerrisDB’s lock-free skip list solves this by allowing multiple “cashiers” to work simultaneously.

Skip lists are like a subway system for your data:

Regular linked list (local train):

Station1 → Station2 → Station3 → Station4 → Station5

Skip list (express system):

Express: Station1 -----------→ Station3 -----------→ Station5
Local: Station1 → Station2 → Station3 → Station4 → Station5

To find Station4:

  1. Take express to Station3 (skip Station2)
  2. Switch to local for one stop

This makes finding data much faster - O(log n) instead of O(n).

Level 3: HEAD ------------------> 30 -------------------------> NULL
Level 2: HEAD ------> 10 -------> 30 -------> 50 -------------> NULL
Level 1: HEAD -> 5 -> 10 -> 20 -> 30 -> 40 -> 50 -> 60 -------> NULL
Level 0: HEAD -> 5 -> 10 -> 20 -> 30 -> 40 -> 50 -> 60 -> 70 -> NULL
↑ ↑ ↑
Start here Found quickly! Without skipping

Key principles:

  1. Multiple levels: Express lanes for faster traversal
  2. Probabilistic structure: Randomly decide how many levels each node gets
  3. Lock-free reads: Multiple threads can search simultaneously

Let’s examine FerrisDB’s concurrent skip list implementation:

ferrisdb-storage/src/memtable/skip_list.rs
pub struct SkipList {
/// The head node of the skip list (sentinel)
head: Arc<Node>,
/// Maximum height of any node in the list
max_height: AtomicUsize,
/// Thread-safe random number generator
rng: Mutex<StdRng>,
}
struct Node {
/// The actual key-value data
key: Option<Vec<u8>>,
value: Option<Vec<u8>>,
/// Forward pointers for each level
next: Vec<Atomic<Node>>,
}

The beauty of skip lists is that searches never need locks:

pub fn get(&self, key: &[u8]) -> Option<Vec<u8>> {
let mut current = self.head.clone();
let mut level = self.max_height.load(Ordering::Relaxed);
// Start from highest level and work down
while level > 0 {
level -= 1;
// Move forward on current level
loop {
let next = current.next[level].load(Ordering::Acquire);
match next {
Some(next_node) if next_node.key < key => {
current = next_node;
}
_ => break, // Go down a level
}
}
}
// Check if we found the key
let next = current.next[0].load(Ordering::Acquire);
match next {
Some(node) if node.key == key => Some(node.value.clone()),
_ => None,
}
}

Why This Works

  • No locks needed - just atomic reads - Multiple threads can search simultaneously - Progress guaranteed - no thread blocks another

Performance Impact

  • Read throughput scales with CPU cores - No cache line bouncing between cores - Predictable latency (no lock waiting)

Insertion is more complex but still achieves excellent concurrency:

pub fn insert(&self, key: Vec<u8>, value: Vec<u8>) {
let height = self.random_height();
let mut update: Vec<Arc<Node>> = vec![self.head.clone(); height];
// Phase 1: Find insertion position at each level
let mut current = self.head.clone();
for level in (0..height).rev() {
loop {
let next = current.next[level].load(Ordering::Acquire);
match next {
Some(next_node) if next_node.key < key => {
current = next_node;
}
_ => {
update[level] = current.clone();
break;
}
}
}
}
// Phase 2: Create new node
let new_node = Arc::new(Node {
key: Some(key),
value: Some(value),
next: vec![Atomic::null(); height],
});
// Phase 3: Link in from bottom to top
for level in 0..height {
loop {
let next = update[level].next[level].load(Ordering::Acquire);
new_node.next[level].store(next.clone(), Ordering::Release);
// Compare-and-swap to atomically insert
match update[level].next[level].compare_exchange(
next,
Some(new_node.clone()),
Ordering::Release,
Ordering::Acquire,
) {
Ok(_) => break, // Success!
Err(_) => {
// Another thread modified - retry
continue;
}
}
}
}
}

Understanding memory ordering is crucial for lock-free programming:

  1. Relaxed: No synchronization, just atomicity

    height.load(Ordering::Relaxed) // Just need the value
  2. Acquire: All subsequent reads see writes before the Release

    next.load(Ordering::Acquire) // Need to see node data
  3. Release: All previous writes visible to Acquire loads

    next.store(new_node, Ordering::Release) // Publish our writes
  4. AcqRel: Both Acquire and Release semantics

    compare_exchange(..., Ordering::AcqRel, ...) // Full synchronization
OperationAverage CaseWorst CaseSpace
SearchO(log n)O(n)-
InsertO(log n)O(n)O(log n)
DeleteO(log n)O(n)-
Range QueryO(log n + k)O(n)-

Single-threaded

  • 1M ops/sec for searches - 500K ops/sec for insertions - Comparable to RB-tree

Multi-threaded (8 cores)

  • 7M ops/sec for searches (7x scaling!) - 2M ops/sec for insertions - Far exceeds locked structures
// From FerrisDB benchmarks
test bench_skiplist_insert_single ... bench: 180 ns/iter
test bench_skiplist_insert_parallel ... bench: 45 ns/iter (4 threads)
test bench_btreemap_insert_locked ... bench: 220 ns/iter
test bench_btreemap_insert_parallel ... bench: 850 ns/iter (4 threads)

The ABA problem occurs in lock-free programming when:

  1. Thread 1 reads value A
  2. Thread 2 changes A → B → A
  3. Thread 1’s CAS succeeds but misses the intermediate change

Solution: Hazard pointers or epoch-based reclamation

// FerrisDB uses epoch-based memory reclamation
struct EpochGuard {
epoch: Arc<AtomicU64>,
}
impl Drop for EpochGuard {
fn drop(&mut self) {
// Signal we're done with this epoch
self.epoch.fetch_add(1, Ordering::Release);
}
}

Lock-free structures can’t immediately free memory:

// Problem: When can we safely delete this node?
let old = node.next.swap(new_next, Ordering::AcqRel);
// drop(old); // UNSAFE! Another thread might be reading it
// Solution: Defer deletion
garbage_collector.defer_drop(old);
#[test]
fn test_concurrent_insert() {
let list = Arc::new(SkipList::new());
let mut handles = vec![];
// Spawn multiple threads
for i in 0..10 {
let list_clone = list.clone();
handles.push(thread::spawn(move || {
for j in 0..1000 {
list_clone.insert(
format!("key_{}_{}", i, j).into_bytes(),
format!("value_{}_{}", i, j).into_bytes(),
);
}
}));
}
// Wait for all threads
for handle in handles {
handle.join().unwrap();
}
// Verify all insertions succeeded
assert_eq!(list.len(), 10_000);
}
#[cfg(loom)]
#[test]
fn loom_concurrent_test() {
loom::model(|| {
let list = Arc::new(SkipList::new());
// Loom will explore all possible interleavings
let t1 = loom::thread::spawn({
let list = list.clone();
move || list.insert(b"key1", b"value1")
});
let t2 = loom::thread::spawn({
let list = list.clone();
move || list.insert(b"key2", b"value2")
});
t1.join().unwrap();
t2.join().unwrap();
assert_eq!(list.len(), 2);
});
}

Perfect For

  • High-concurrency workloads - Read-heavy with some writes - In-memory databases - Real-time systems

Avoid When

  • Need deterministic structure - Cache efficiency critical - Simple single-threaded use - Persistent storage needed
  • Redis: Uses skip lists for sorted sets
  • LevelDB/RocksDB: MemTable implementation
  • Apache Cassandra: Secondary indexes
  • MemSQL: In-memory tables

Try implementing a simple concurrent counter:

struct ConcurrentCounter {
count: AtomicU64,
}
impl ConcurrentCounter {
fn increment(&self) -> u64 {
// Your code here:
// 1. Load current value
// 2. Add 1
// 3. Compare-and-swap
// 4. Retry if failed
}
}
  1. Skip lists enable lock-free concurrency: Multiple threads can work simultaneously
  2. Probabilistic structure: Simple to implement, good average performance
  3. Memory ordering matters: Wrong ordering = data races
  4. Testing is crucial: Use tools like Loom for verification

Part of the FerrisDB Learning Journey. Built with ❤️ by a human and an AI.