Skip to content

Future Architecture Explorations

This page documents interesting architectural patterns and advanced concepts we might explore as FerrisDB evolves from an educational project toward production-ready capabilities.

Concept

Completely decouple storage nodes from compute nodes for independent scaling and cost optimization.

  • Storage layer: Pure data persistence (database-aware object storage)
  • Compute layer: Query processing, transactions, caching
  • Benefits: Independent scaling, cost optimization, multi-tenant isolation
  • Examples: Snowflake, Amazon Aurora, CockroachDB Serverless

Learning value: Understanding modern cloud-native database architecture patterns.

  • Each node owns data shards
  • Data locality for performance
  • Simpler consistency model
  • Traditional distributed database approach

Log as Database

All operations are immutable log entries

Materialized Views

Derive tables and indexes from the log

Time Travel

Query any point in history naturally

Benefits

Perfect audit trail, simplified backup/restore, event sourcing

Research areas:

  • Log compaction strategies
  • Efficient materialized view maintenance
  • Query optimization over log structures

📊 HTAP (Hybrid Transactional/Analytical)

Section titled “📊 HTAP (Hybrid Transactional/Analytical)”

Goal: Single system handles both OLTP and OLAP workloads efficiently.

  • Columnar storage: For analytical queries
  • Row storage: For transactional workloads
  • Automatic routing: Query optimizer chooses optimal storage format
  • Real-time analytics: Fresh data available immediately

Progressive approach: Support multiple data models while learning optimal integration patterns.

  1. Phase 1: Layered Implementation

    Document API → LSM Storage Engine
    Graph API → LSM Storage Engine
    TimeSeries → LSM Storage Engine
  2. Phase 2: Hybrid Integration

    • Native JSON document support in storage format
    • Specialized indexing for different models
    • Cross-model query capabilities
  3. Phase 3: Unified Multi-Model

    • Storage engine natively understands multiple data types
    • Atomic transactions across all models
    • Optimized storage layouts per data type

Models to explore:

Document Store

JSON/BSON with rich querying

Graph Database

Relationships and graph traversals

Time Series

Optimized for metrics and IoT data

Search Engine

Full-text search and indexing

Beyond Raft: Explore alternative coordination mechanisms for better performance.

  • CRDTs: Conflict-free replicated data types for eventual consistency
  • Calvin-style: Deterministic transaction scheduling
  • Clock synchronization: Spanner-style global ordering
  • Hybrid approaches: Combine techniques based on workload characteristics

Real-time data processing: Built-in stream processing capabilities.

Change Streams

Real-time data change notifications

Materialized Views

Continuously updated query results

Event Sourcing

Store events, compute state on demand

Stream Integration

Native Kafka/Pulsar compatibility

Use cases:

  • Real-time analytics and dashboards
  • Event-driven microservices integration
  • Live data synchronization between systems

Global distribution: Advanced topology management for worldwide deployments.

  • Region-aware partitioning: Data gravity and compliance requirements
  • Cross-region transactions: Global consistency with performance optimization
  • Cloud portability: Seamless operation across AWS/GCP/Azure
  • Edge caching: Bringing data closer to users

Machine learning integration: Systems that optimize themselves based on workload patterns.

  • Auto-compaction: ML-driven compaction strategies
  • Query optimization: Learn from historical query patterns
  • Resource allocation: Dynamic memory/CPU allocation
  • Anomaly detection: Automatic performance issue detection

Pay-per-query model: True serverless database with instant scaling.

Instant Startup

Cold start in milliseconds

Auto-scaling

Scale to zero, scale to millions

Function Integration

Native serverless function support

Cost Model

Pay only for storage and compute used

Technical challenges:

  • Warm/cold state management
  • Connection pooling and management
  • Resource scheduling and allocation
  • Billing and metering accuracy

Ranked by educational value:

  1. Log-structured everything - Fundamental paradigm shift
  2. HTAP architecture - Combines multiple database concepts
  3. Multi-model architecture - Progressive complexity building
  4. Separation of storage/compute - Modern cloud patterns
  5. Consensus-free coordination - Cutting-edge distributed systems

LSM-Tree Paper

“The Log-Structured Merge-Tree (LSM-Tree)” - O’Neil et al.

Spanner Paper

“Spanner: Google’s Globally Distributed Database” - Corbett et al.

Calvin Paper

“Calvin: Fast Distributed Transactions for Partitioned Database Systems” - Thomson et al.

  • FoundationDB: Multi-model with ACID guarantees
  • YugabyteDB: Multi-model with PostgreSQL compatibility