Skip to content

SeedyROM/timescam

Repository files navigation

TimescamDB

Rust License: MIT Tests

A distributed time-series database built in Rust, designed for high-throughput analytics and streaming data workloads.

Features

Core Database

  • High-Throughput Writes: 825K+ events/sec on single-node (disk I/O bound, optimal performance)
  • Distributed Consensus: Built on openraft for strong consistency and fault tolerance
  • Streaming Snapshots: Memory-efficient snapshot transfers with zero-copy RocksDB checkpoints
  • Schemaless Storage: Flexible rkyv zero-copy encoded payloads for dynamic data structures
  • Time-Ordered Keys: Efficient range scans using UUID v7 and lexicographic ordering
  • Batched Writes: High-performance batch operations for bulk inserts (30k batches from the default Rust client)
  • gRPC Interface: Modern, efficient client-server communication with Tonic
  • Query DSL: Pipeline-style query language with Rust-based parser (Logos + LALRPOP)
  • Query Engine: Aggregations (sum, avg, count, min, max, percentiles) and tag-based filtering
  • Time Bucketing: TimescaleDB-inspired time-series bucketing with parallel aggregation and custom origin support
  • Aligned Time Buckets: Custom bucket alignment for business hours, shifts, fiscal periods, and timezone boundaries
  • Window Functions: Moving averages (SMA/EMA), cumulative sums, rate/delta calculations, lag/lead, and row_number
  • Client Library: High-level Rust API with ergonomic query support

Performance Optimizations

  • Zero-Copy Queries: rkyv serialization eliminates deserialization overhead
  • Query Caching: Two-tier LRU cache (L1 memory + L2 RocksDB) with >95% hit rates
  • Concurrent Compaction Throttling: Semaphore-based limits prevent memory exhaustion
  • Memory-Bounded Operations: O(1) approximate percentiles, query size limits
  • Optimal RocksDB Tuning: 512MB block cache, readahead for range scans, bloom filters
  • Disk I/O Bound: <1% CPU overhead, 91% time waiting for disk (hardware-limited performance)
  • Efficient Async Runtime: tokio spawn_blocking for I/O operations, minimal async overhead

Columnar Storage

  • Hybrid Two-Tier Architecture (Three-Tier planned): Hot (rkyv) → Warm (Parquet) → Cold (S3, planned)
  • Background Compaction: Compact RocksDB data and also age-based conversion from row to columnar format (zero write overhead) with different workers
  • 70-90% Storage Reduction: Parquet compression vs rkyv
  • 10-100x+ Faster Analytics: Columnar queries on historical data
  • Unified Query Layer: Transparent querying across hot and warm storage tiers
  • Multi-Node Replication: Parquet metadata via Raft, files via gRPC streaming
  • Production Monitoring: 20+ Prometheus metrics for observability
  • rkyv Zero-Copy Reads: Eliminates deserialization overhead for pre-computed results

Continuous Aggregates

  • Real-Time Materialized Views: Sub-millisecond incremental refresh on 1.3M events
  • 100x+ Query Speedup: Pre-computed aggregates vs raw event scans (0.36ms vs 123ms)
  • Inline Computation: Aggregates update during writes (156K events/sec with CAs enabled)
  • Time-Bucketed Storage: Efficient updates using time_bucket alignment with custom origin support
  • Aligned Bucket Support: Business-aligned aggregations (business hours, shifts, fiscal periods)
  • Zero Lag Refresh: Aggregates available immediately after write batches
  • rkyv Zero-Copy Storage: No deserialization overhead for pre-computed results
  • ClickHouse-Style Performance: Production-ready incremental materialized views

See docs/storage/COLUMNAR_STORAGE.md for columnar storage details and docs/storage/CONTINUOUS_AGGREGATES.md for continuous aggregate architecture.

Architecture

Storage Tiers (Per Shard/Single Node)

   ┌─────────────────────────────────────────────────────┐
   │              TIER 1: HOT STORAGE                    │
   │  rkyv zero-copy in RocksDB (7 days)                 │
   │  • Fast writes, immediately queryable               │
   │  • Raft replicated                                  │
   └──────────────────┬──────────────────────────────────┘
                      │ Background compaction
                      ↓
   ┌─────────────────────────────────────────────────────┐
   │              TIER 2: WARM STORAGE                   │
   │  Parquet files (7+ days, indefinite)                │
   │  • 70-90% storage reduction                         │
   │  • 10-100x faster analytics                         │
   │  • Metadata replicated via Raft                     │
   └──────────────────┬──────────────────────────────────┘
                      │ Age-based archival (not implemented)
                      ↓
   ┌─────────────────────────────────────────────────────┐
   │              TIER 3: COLD STORAGE (Planned)         │
   │  S3/Object Storage (not implemented)                │
   │  • Cheap long-term storage                          │
   │  • On-demand retrieval                              │
   └─────────────────────────────────────────────────────┘

Cluster Architecture (Multi-Node)

                ┌───────────────────────────────────────────┐
                │              Client (gRPC)                │
                └─────────────────────┬─────────────────────┘
                                      │
                ┌─────────────────────▼─────────────────────┐
                │            Router / Coordinator           │
                │   (Range-based routing, scatter-gather)   │
                └─────────────────────┬─────────────────────┘
                                      │
          ┌───────────────────────────┼───────────────────────────┐
          │                           │                           │
   ┌──────▼──────┐             ┌──────▼──────┐             ┌──────▼──────┐
   │   Shard 0   │             │   Shard 1   │             │   Shard 2   │
   │  (Raft 0)   │◄───────────►│  (Raft 1)   │◄───────────►│  (Raft 2)   │
   │             │             │             │             │             │
   │  ┌───────┐  │             │  ┌───────┐  │             │  ┌───────┐  │
   │  │RocksDB│  │             │  │RocksDB│  │             │  │RocksDB│  │
   │  └───────┘  │             │  └───────┘  │             │  └───────┘  │
   └─────────────┘             └─────────────┘             └─────────────┘

Current Status

TimescamDB is under active development with 654 tests passing across the workspace.

Core Database:

  • ✅ Phase 1-5 Complete: Foundation → Query Engine with client library
  • ✅ Distributed Raft consensus with streaming snapshots
  • ✅ Range-based sharding and multi-shard queries
  • ✅ gRPC API with aggregations and tag filtering
  • ✅ RemoteShardClient for multi-node inter-shard communication

Columnar Storage:

  • ✅ Phase 1-5 Complete: Schema discovery, Parquet compaction, unified queries, Raft integration, monitoring & optimization
  • ✅ Background compaction with hot tier compaction, warm entity discovery and file distribution between nodes
  • ✅ Concurrent compaction throttling and memory-bounded operations
  • ✅ Multi-node support with gRPC file streaming
  • ✅ Prometheus metrics with 20+ observability metrics
  • ✅ Production testing with 81 Parquet compaction tests (data shifting, stress, failures, GC, recovery)
  • ✅ Query caching with two-tier LRU (L1 memory + L2 RocksDB)
  • ✅ CLI configuration/commands with 18 flags and environment variable support
  • ✅ Warm storage queries with transparent tier selection and result merging
  • ✅ Performance optimizations (batching, readahead, bloom filters)
  • ✅ Documentation for architecture and usage
  • 📋 Future Work: S3 cold storage, SQL will never be supported since this is not the domain of this project.

See PROGRESS.md for detailed implementation status (this can get outdated and will be removed soon), ARCHITECTURE.md for system design, docs/storage/COLUMNAR_STORAGE.md for columnar storage details, and docs/query/TIME_BUCKETING.md for time-series analytics features.

Quick Start

Prerequisites

  • Rust 1.82+ or Nightly (2024+ edition)
  • RocksDB development libraries (e.g., librocksdb-dev on Ubuntu for faster compilation)

Installation

git clone https://github.com/SeedyROM/timescam.git
cd timescam
cargo build --release

Running Tests

# Run all tests across workspace
cargo test

# Run tests for specific crate
cargo test -p timescam
cargo test -p timescam-client
cargo test -p timescamd

Usage Examples

Basic Usage with Client Library

Add to your Cargo.toml:

[dependencies]
timescam-client = "0.1"

Query DSL (Recommended)

TimescamDB includes a powerful pipeline-style query language for flexible, ad-hoc queries:

use timescam_client::TimescamClient;

let client = TimescamClient::connect("http://localhost:5001").await?;

// Simple time-bucketed aggregation with filtering
let query = r#"
    query "sensor:temp_*"
      | range -24h to now
      | filter tags.location == "warehouse"
      | bucket 5m
      | aggregate {
          avg_temp: mean(data.temperature),
          max_temp: max(data.temperature),
          event_count: count()
        }
"#;

let results = client.query_dsl(query).await?;

// Window functions for moving averages
let query = r#"
    query "sensor:temp_01"
      | range -1h to now
      | bucket 1m
      | aggregate { temp: mean(data.temperature) }
      | window {
          moving_avg: moving_average(temp, 5),
          rate_of_change: rate(temp)
        }
"#;

let results = client.query_dsl(query).await?;

// Query multiple entities with pattern matching
let query = r#"
    query "sensor:*"
      | range -1h to now
      | filter data.temperature > 25
      | aggregate {
          hot_sensors: count(),
          avg_temp: mean(data.temperature)
        }
"#;

let results = client.query_dsl(query).await?;

Query DSL Features:

  • Pipeline-style syntax - Intuitive left-to-right data flow
  • Entity patterns - Match multiple entities with wildcards
  • Time ranges - Relative (-24h, --7d) or absolute timestamps
  • Filtering - Tag and data field comparisons
  • Time bucketing - Group events into fixed intervals
  • Aggregations - count(), sum(), mean(), min(), max(), percentile()
  • Window functions - moving_average(), rate(), lag(), lead()
  • EXPLAIN plans - Query optimization analysis
  • Query caching - >95% cache hit rates in production

Continuous Aggregates with DSL Commands

Create materialized views for 342x faster queries on pre-computed aggregations:

// Create hourly sensor metrics (automatic pre-computation)
let create_cmd = r#"
    create continuous aggregate hourly_sensor_metrics
    from "sensor:*"
    bucket 1h
    aggregate {
        avg_temp: avg(temperature),
        min_temp: min(temperature),
        max_temp: max(temperature),
        readings: count()
    }
    refresh every 5m
"#;

client.execute_command(create_cmd).await?;

// Query the continuous aggregate results (342x faster!)
let query = r#"
    query aggregate "hourly_sensor_metrics:*"
      | range -24h to now
"#;

let mut stream = client.query_dsl(query).await?;  // 0.36ms vs 123ms
while let Some(result) = stream.next().await {
    // Process pre-computed hourly metrics
}

// List all continuous aggregates
let list_result = client.execute_command("list continuous aggregate").await?;
println!("Aggregates: {:?}", list_result.aggregates);

// Manually refresh an aggregate
let refresh_cmd = "refresh continuous aggregate hourly_sensor_metrics";
client.execute_command(refresh_cmd).await?;

// Drop an aggregate
let drop_cmd = "drop continuous aggregate hourly_sensor_metrics";
client.execute_command(drop_cmd).await?;

Continuous Aggregate Features:

  • 342x query speedup - Pre-computed vs raw event scans
  • Sub-millisecond refresh - Updates in <1ms on 1.3M events
  • Automatic matching - Queries automatically use CAs when available
  • Scheduled refresh - Keep aggregates up-to-date automatically
  • Pattern matching - Single CA handles multiple entities (sensor:*)

See docs/storage/CONTINUOUS_AGGREGATES.md for architecture details.

See docs/QUERY_LANGUAGE.md for complete DSL documentation and dsl_query_guide.rs for working examples.

Rust Builder API

For programmatic queries with type safety and compile-time validation, use the Builder API:

use timescam_client::{TimescamClient, payload};

// Connect to a TimescamDB cluster
let client = TimescamClient::connect("http://localhost:5001").await?;

// Write events with tags
client.write(
    "sensor:temp_01",
    payload!({
        "temperature": 22.5,
        "unit": "celsius"
    })
)
.with_tag("location", "living_room")
.with_tag("device_type", "temperature")
.send()
.await?;

// Query events by time range
let events = client.query("sensor:temp_01")
    .time_range(start_time, end_time)
    .execute()
    .await?;

// Filter by tags
let events = client.query("sensor:temp_01")
    .with_tag_filter("location", "living_room")
    .time_range(start_time, end_time)
    .execute()
    .await?;

// Aggregate data
let response = client.aggregate("sensor:temp_01")
    .avg("temperature")
    .time_range(start_time, end_time)
    .execute()
    .await?;

// Time-bucketed aggregation (hourly averages for last 24 hours)
let results = client.time_bucket("sensor:temp_01")
    .interval_hours(1)
    .last_hours(24)
    .avg("temperature")
    .min("temperature")
    .max("temperature")
    .execute()
    .await?;

for bucket in results {
    println!("Hour: {}, Avg: {:.2}°C", bucket.bucket_start, bucket.avg.unwrap());
}

When to use the Builder API:

  • Programmatic query construction
  • Type-safe compile-time validation
  • Maximum performance optimization
  • Integration with Rust applications

Low-level Storage API (Server-side)

use timescamd::{StorageEngine, Event, EventKey};
use timescam::payload;

// Create a storage engine
let storage = StorageEngine::open("./data")?;

// Write an event
let key = EventKey::with_new_id("user:123", 1704067200000000);
let event = Event::new(payload!({
    "action": "click",
    "page": "/home",
    "timestamp": 1704067200
}));

storage.put_event(&key, &event)?;

// Scan events for an entity
let events = storage.scan_entity("user:123")?;

// Query by time range
let events = storage.scan_time_range(
    "user:123",
    1704067200000000,  // start (microseconds)
    1704067300000000   // end (microseconds)
)?;

See crates/timescam-client/examples/ for complete working examples including:

Run examples with:

# Start the server first (preferably in single-node mode, with release build)
cargo run --bin timescamd --release -- run 1 127.0.0.1:5001

# Then run examples (in a new terminal)
cargo run -p timescam-client --example dsl_query_guide
cargo run -p timescam-client --example quickstart
cargo run -p timescam-client --example iot_monitoring

Key Design Decisions

Storage Format

Key: Binary encoded with structure [version:u8][entity_len:u16][entity_id:bytes][timestamp:i64][uuid:16 bytes]

  • Binary format (v1): 32 bytes smaller (40% reduction vs string format)
  • Faster parsing: No string conversion or separator parsing overhead
  • Lexicographic ordering: Big-endian encoding ensures correct time-ordered iteration
  • Efficient range scans: Entity prefix + timestamp enables targeted queries
  • UUID v7: Time-ordered event IDs for additional sorting and uniqueness
  • Backward compatibility: Supports legacy string format (entity_id:timestamp:uuid) for migration

Value: Adaptive binary encoding with tiered formats

TimescamDB uses intelligent format selection based on payload characteristics for optimal storage efficiency:

Encoding Tiers (automatic selection via version byte):

  • Version 0x01 - Inline Format: Primitives (null, bool, int, float, short strings ≤255 bytes)

    • 62-96% smaller than rkyv
    • 2-3x faster encoding/decoding
    • No compression overhead
    • Optimized for simple metrics and counters (1-64 bytes)
  • Version 0x00 - Uncompressed rkyv: Small or incompressible payloads (<1KB)

    • Zero-copy deserialization with 8-byte aligned headers
    • Schema-free for dynamic data structures
    • Instant access without decompression
    • Best for frequently accessed data
  • Version 0x02 - LZ4 Compression: Medium payloads (1-10KB)

    • 40-60% compression ratio
    • ~5μs decompression overhead
    • Fast compression for balanced performance
    • Ideal for structured event data
  • Version 0x03 - ZSTD Compression: Large payloads (>10KB)

    • 60-80% compression ratio
    • ~20μs decompression overhead
    • Maximum compression for bulk data
    • Best for large nested objects or arrays

Performance Features:

  • Adaptive selection: Format chosen automatically based on payload size and compressibility
  • Zero-copy access: Uncompressed rkyv supports direct memory access without deserialization
  • Lazy evaluation: LazyEvent wrapper defers full deserialization - middleware only decodes data when actually needed (e.g., tag filtering skips payload decoding)
  • Fallback logic: Only compresses if space savings justify overhead (incompressible data stored uncompressed)

Performance Features

Streaming Snapshots

  • Zero-copy checkpoints: RocksDB creates snapshots using hard links (instant, no data copy)
  • Incremental serialization: Data streamed field-by-field to avoid memory spikes
  • 64KB chunks: Streamed over gRPC with bounded memory usage
  • Batch installation: 10-20x faster snapshot restoration using batched writes

Count Tracking

  • Lazy recomputation: Counts computed on-demand to avoid read-modify-write bottleneck on writes
  • Performance trade-off: Removed from hot write path (was causing throughput degradation)
  • Sharding limitation: Cached counts don't aggregate correctly across shards - use COUNT(*) aggregation queries instead for accurate distributed counts
  • See docs/LIMITATIONS.md for details on multi-shard aggregation behavior

Technical Highlights

Core Database

  • 300K+ Events/Second: Single-node write throughput (4.5x faster than TimescaleDB on comparable hardware)
  • LSM Tree Storage: RocksDB-backed engine with sequential writes and minimal hot-path overhead
  • Schema-Flexible Design: Hot path optimized for speed (schema tracking disabled for writes)
  • Lazy Evaluation: LazyEvent pattern defers deserialization until needed (tag filtering, aggregations)
  • Streaming Snapshots: Zero-copy RocksDB checkpoints with memory-efficient chunk streaming (handles multi-GB snapshots)
  • Raft Consensus: Built on openraft with strong consistency and automatic failover
  • Range-based Sharding: Horizontal scaling with O(log n) entity-to-shard routing
  • Query DSL: Pipeline-style query language with span-based errors, EXPLAIN plans, and query caching
  • Query Engine: Aggregations (sum, avg, count, min, max, percentiles) with multi-shard coordination
  • Time Bucketing: Fixed-interval time-series aggregations with parallel processing (10K event chunks)
  • Window Functions: Moving averages, cumulative sums, lag/lead operations, and rate calculations
  • Tag Filtering: Efficient tag-based event filtering with scatter-gather queries, also supports lazy event decoding to only decode tags when filtering
  • Client Library: Ergonomic Rust API with high-level abstractions for common operations
  • Batch Operations: High-performance bulk writes grouped by single node or shard
  • Raft Log Compaction: Automatic RocksDB compaction for hot storage and Raft logs

Columnar Storage

  • Hybrid Architecture: Hot (rkyv zero-copy) for recent data, warm (Parquet) for historical analytics
  • Schema-Aware Compaction: Schema tracking during Parquet compaction (cold path only)
  • Background Compaction: Asynchronous age-based conversion to columnar format (zero write impact)
  • Unified Query Layer: Transparent tier selection and result merging across hot/cold storage
  • Compression: 70-90% storage reduction with ZSTD-compressed Parquet files
  • Performance: 10-100x faster aggregations on historical data vs row-oriented storage
  • Multi-Node Replication: Parquet metadata via Raft, files transferred with gRPC streaming

Continuous Aggregates

  • Real-Time Materialized Views: Incremental computation during write batches
  • 200x Query Performance: Pre-computed time-bucketed aggregates (0.36ms~ vs 123ms~)
  • Sub-Millisecond Refresh: Aggregate updates complete in <1ms on 1M+ events
  • rkyv Zero-Copy Storage: Eliminates deserialization overhead for pre-computed results
  • No Write Penalty: 150K+ events/sec with continuous aggregates enabled
  • Time-Bucket Aligned: Only updates affected buckets, not full re-aggregation
  • Flexible Aggregations: Supports sum, avg, count, min, max, percentiles

See ARCHITECTURE.md for detailed technical design, docs/storage/COLUMNAR_STORAGE.md for columnar storage architecture, docs/storage/CONTINUOUS_AGGREGATES.md for continuous aggregate details, docs/QUERY_LANGUAGE.md for query DSL documentation, and docs/query/TIME_BUCKETING.md for time-series analytics capabilities.

Monitoring & Observability

TimescamDB includes comprehensive Prometheus metrics for production monitoring. Metrics are exposed via HTTP on a configurable endpoint (default: http://localhost:9000/metrics).

Metric Categories

  • Storage & Write Metrics - Write/read/scan latency, throughput, errors, data volume
  • Query Metrics - Query duration by tier (hot/cold/aggregate/dsl), events scanned, errors
  • Compaction Metrics - Compaction duration, compression ratios, lag, Parquet file sizes
  • Parquet & Partition Pruning - File counts, pruning efficiency, scan statistics
  • Schema Discovery - Schema extraction operations and errors
  • Raft Cluster Metrics - Term, leadership, proposal latency, elections, cluster size
  • RPC Metrics - RPC duration, requests, and errors by method
  • Query Execution Limits - Circuit breaker state, limit violations, memory usage

50+ metrics total including histograms, counters, and gauges for comprehensive observability.

Quick Access

# View all metrics (default port)
curl http://localhost:9090/metrics

# Custom metrics port
cargo run --bin timescamd --release -- run 1 127.0.0.1:5001 --metrics-port 9091

# Prometheus configuration
# Add to prometheus.yml:
scrape_configs:
  - job_name: 'timescamdb'
    static_configs:
      - targets: ['localhost:9091']

See docs/METRICS.md for complete metric reference, Prometheus queries, alerting recommendations, and dashboard suggestions.

Deployment Modes

TimescamDB supports two deployment modes:

Single-Node Mode (Default)

For development, testing, or single-server deployments. Raft consensus is automatically disabled when no cluster members are specified.

Characteristics:

  • ✅ Simplified deployment (no cluster coordination)
  • ✅ Lower latency (no consensus overhead)
  • ✅ Direct storage writes (no Raft log)
  • ✅ All features available (queries, aggregations, columnar storage)
  • ❌ No high availability or replication
  • ❌ No distributed consensus

Usage:

# Single-node with auto-assigned node ID and address
cargo run --bin timescamd --release -- run 1 127.0.0.1:5001

# With custom data directory
cargo run --bin timescamd --release -- run 1 127.0.0.1:5001 ./my-data

# With metrics on custom port
cargo run --bin timescamd --release -- run 1 127.0.0.1:5001 --metrics-port 9092

What happens: The server detects no --cluster parameter and automatically runs in single-node mode:

  • No Raft consensus layer
  • Writes go directly to RocksDB storage
  • Raft transport and admin services are not started
  • Lower resource usage and simpler operation

Cluster Mode (Multi-Node)

For production deployments requiring high availability and fault tolerance. Raft consensus is automatically enabled when cluster members are specified.

Characteristics:

  • ✅ High availability and automatic failover
  • ✅ Strong consistency via Raft consensus
  • ✅ Data replication across nodes
  • ✅ Leader election and log replication
  • ⚠️ Higher latency (consensus overhead)
  • ⚠️ More complex deployment

Usage:

# 3-node cluster (run on separate machines or ports)

# Node 1 (bootstrap node)
cargo run --bin timescamd --release -- run 1 127.0.0.1:5001 \\
  --cluster "1=http://127.0.0.1:5001,2=http://127.0.0.1:5002,3=http://127.0.0.1:5003"

# Node 2
cargo run --bin timescamd --release -- run 2 127.0.0.1:5002 \\
  --cluster "1=http://127.0.0.1:5001,2=http://127.0.0.1:5002,3=http://127.0.0.1:5003"

# Node 3
cargo run --bin timescamd --release -- run 3 127.0.0.1:5003 \\
  --cluster "1=http://127.0.0.1:5001,2=http://127.0.0.1:5002,3=http://127.0.0.1:5003"

What happens: The server detects cluster members and automatically enables Raft:

  • First node (ID 1) bootstraps the cluster
  • Other nodes join as learners and are promoted to voters
  • Leader election occurs automatically
  • All writes go through Raft consensus
  • Raft transport and admin services are started

Cluster Scripts:

# Start a 3-node cluster locally (for testing)
./testbed/start-cluster.sh

# Stop the cluster
./testbed/stop-cluster.sh

Choosing a Deployment Mode

Use Single-Node Mode when:

  • Development or testing
  • Single-server deployment acceptable
  • Low-latency requirements
  • No need for high availability
  • Simple operational requirements

Use Cluster Mode when:

  • Production deployments
  • High availability required
  • Data durability critical
  • Can tolerate consensus latency
  • Multi-datacenter deployment

Migration Path

You can start with single-node mode and migrate to cluster mode later:

  1. Start single-node: Deploy without --cluster flag
  2. Add nodes: When ready, deploy additional nodes with --cluster flag
  3. Bootstrap cluster: Use the admin API to convert to cluster mode
  4. Data migration: Use backup/restore or streaming replication (coming soon)

Development

Getting Started

# Clone the repository
git clone https://github.com/SeedyROM/timescam.git
cd timescam

# Run tests
cargo test

# Run the server (single-node mode)
cargo run --bin timescamd --release -- run 1 127.0.0.1:5001

# Run client examples
cargo run --release --example quickstart

# Run benchmarks
cargo bench

Pre-commit Hooks

This project uses pre-commit hooks to ensure code quality and consistency. The hooks run automatically before each commit and push to:

  • Format Rust code with cargo fmt
  • Lint with cargo clippy (warnings treated as errors)
  • Check YAML and TOML files
  • Remove trailing whitespace and fix end-of-file formatting
  • Prevent large files and merge conflicts
  • Run tests before pushing (pre-push hook)

Setup pre-commit hooks:

# Install pre-commit (if not already installed)
pip install pre-commit
# or
brew install pre-commit

# Install the git hooks (both pre-commit and pre-push)
pre-commit install
pre-commit install --hook-type pre-push

# (Optional) Run hooks manually on all files
pre-commit run --all-files

Hook configuration:

The hooks are configured in .pre-commit-config.yaml:

  • General hooks: trailing whitespace, end-of-file fixer, YAML/TOML validation, large file detection
  • Rust hooks: cargo fmt, cargo clippy --fix, cargo check
  • Test hooks: cargo test runs on pre-push (not pre-commit to keep commits fast)

If you need to skip hooks temporarily (not recommended):

git commit --no-verify -m "message"  # Skip pre-commit hooks
git push --no-verify                 # Skip pre-push hooks

Contributing

Contributions are welcome! Please see our CONTRIBUTING.md for detailed guidelines and community standards.

Dependencies

See workspace Cargo.toml and individual crate manifests for full dependency lists.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • Built with openraft - excellent Raft implementation
  • Zero-copy serialization with rkyv
  • RocksDB storage engine via rust-rocksdb
  • Apache Arrow and Parquet support with arrow-rs
  • Inspired by time-series databases like InfluxDB and TimescaleDB
  • Thanks to the Rust community for amazing libraries and tooling, and allowing me to play with crazy systems programming legos.

Status: "Production-ready" for single-node and experimental in multi-node deployments. Production validation needed - this is pre-1.0 software, please use with caution and report any issues.

Built with 🦀 in Rust!

About

A time-series database in Rust combining the best of TimescaleDB, InfluxDB, and QuestDB. Multi-modal storage that's fast to write and even faster to query.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Contributors

Languages