Skip to content
View s4core's full-sized avatar
  • S4Core

Block or report s4core

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
s4core/README.md

💫 S4 - Modern S3-Compatible Object Storage

S4 is a high-performance, S3-compatible object storage server written in Rust. It solves the inode exhaustion problem common with traditional file-based storage systems and provides advanced features like atomic directory operations and content-addressable deduplication.

Demo

Demo Console: s4console · Login: root / password12345 · Resets every 10 min

Demo API: s4core · Access Key ID / Secret Access Key: my-secret-key_id / my-secret-access-key · Resets every 10 min

Features

  • S3 API Compatible: Full compatibility with AWS S3 API (AWS CLI, boto3, etc.)
  • Inode Problem Solved: Append-only log storage eliminates inode exhaustion
  • Content Deduplication: Automatic deduplication saves 30-50% storage space
  • Object Versioning: S3-compatible versioning with delete markers
  • Lifecycle Policies: Automatic object expiration and cleanup of old versions
  • Atomic Operations: Rename directories with millions of files in milliseconds
  • Strict Consistency: Data is guaranteed to be written before returning success
  • IAM & Admin API: Role-based access control (Reader, Writer, SuperUser) with JWT authentication
  • S3 Select SQL: Query CSV/JSON/Parquet objects with full SQL (powered by Apache DataFusion)
  • Multi-Object SQL: Extended S3 Select with glob patterns for querying across multiple objects
  • Federation: Leaderless quorum replication for high availability (N=3, W=2, R=2)
  • High Performance: Optimized for single-node and distributed deployments

Architecture

S4 uses a Bitcask-style storage approach:

  • All objects: Stored in append-only volume files (~1GB each)
  • Metadata: Stored in fjall (LSM-tree, MVCC, LZ4 compression) with separate keyspaces

This approach ensures:

  • Minimal inode usage (1 billion objects = ~1000 files)
  • Maximum write performance (sequential writes)
  • Atomic metadata operations (fjall cross-keyspace batches)
  • Fast recovery (metadata in ACID database + crash-safe journal)

Quick Start

Prerequisites

  • Rust 1.70 or later
  • Linux (recommended) or macOS

Building from Source

# Clone the repository
git clone https://github.com/org/s4.git
cd s4

# Build the project
cargo build --release

# Run the server
./target/release/s4-server

Docker

S4 provides official Docker images for easy deployment in two editions:

Image Tag Edition Description
s4core/s4core:latest CE Community Edition (recommended default)
s4core/s4core:v0.0.8 CE CE with version tag
s4core/s4core:ce CE Explicit CE alias (same as latest)
s4core/s4core:ce-v0.0.8 CE CE with version (explicit)
s4core/s4core:ee EE Enterprise Edition (requires license key)
s4core/s4core:ee-v0.0.8 EE EE with version

CE is fully functional: single node or up to 3-node cluster with quorum replication. EE unlocks unlimited pools/nodes and operational features — see ee/README.md.

Using docker run

# Run S4 server (basic — Community Edition)
docker run -d \
  --name s4core \
  -p 9000:9000 \
  -v s4-data:/data \
  -e S4_BIND=0.0.0.0:9000 \
  s4core/s4core:latest

# Run with custom credentials
docker run -d \
  --name s4core \
  -p 9000:9000 \
  -v s4-data:/data \
  -e S4_BIND=0.0.0.0:9000 \
  -e S4_ACCESS_KEY_ID=myaccesskey \
  -e S4_SECRET_ACCESS_KEY=mysecretkey \
  s4core/s4core:latest

# Run with IAM enabled
docker run -d \
  --name s4core \
  -p 9000:9000 \
  -v s4-data:/data \
  -e S4_BIND=0.0.0.0:9000 \
  -e S4_ROOT_PASSWORD=password12345 \
  s4core/s4core:latest

# Run Enterprise Edition (with license key)
docker run -d \
  --name s4core \
  -p 9000:9000 \
  -v s4-data:/data \
  -e S4_BIND=0.0.0.0:9000 \
  -e S4_LICENSE_KEY=your-license-key-here \
  s4core/s4core:ee

# Build the image locally
docker build -t s4-server .                              # CE
docker build --build-arg EDITION=ee -t s4-server-ee .    # EE

Using Docker Compose

The project includes a docker-compose.yml that runs S4 server together with the web admin console.

# Run full stack (server + web console)
docker compose up --build

# Run in background
docker compose up -d --build

# Run only the server
docker compose up s4-server --build

# With custom environment variables
S4_ROOT_PASSWORD=password12345 docker compose up --build

After startup:

docker-compose.yml overview:

services:
  s4core:
    build: .
    ports:
      - "9000:9000"
    volumes:
      - s4-data:/data
    environment:
      - S4_BIND=0.0.0.0:9000
      - S4_ROOT_PASSWORD=${S4_ROOT_PASSWORD:-}
      - S4_ACCESS_KEY_ID=${S4_ACCESS_KEY_ID:-}
      - S4_SECRET_ACCESS_KEY=${S4_SECRET_ACCESS_KEY:-}

  s4-console:
    image: s4core/s4console:latest
    ports:
      - "3000:3000"
    environment:
      - S4_BACKEND_URL=http://s4-server:9000
    depends_on:
      - s4core

For web console-only development, see frontend/README.md.

Environment Variables

S4 is configured through environment variables:

Variable Description Default Example
S4_BIND S4 BIND host 127.0.0.1:9000 0.0.0.0:9000
S4_ROOT_USERNAME Root admin username root admin
S4_ROOT_PASSWORD Root admin password (enables IAM) None (IAM disabled) password12345
S4_JWT_SECRET Secret key for signing JWT tokens Auto-generated at startup (dev mode only) 256-bit-crypto-random-string-like-this-1234567890ABCDEF
S4_ACCESS_KEY_ID Access key for S3 authentication Auto-generated dev key myaccesskey
S4_SECRET_ACCESS_KEY Secret key for S3 authentication Auto-generated dev key mysecretkey
S4_DATA_DIR Base directory for storage System temp dir /var/lib/s4
S4_MAX_UPLOAD_SIZE Maximum upload size per request 5GB 10GB, 100MB, 1024KB
S4_TLS_CERT Path to TLS certificate (PEM format) None (HTTP mode) /etc/ssl/certs/s4.pem
S4_TLS_KEY Path to TLS private key (PEM format) None (HTTP mode) /etc/ssl/private/s4-key.pem
S4_LIFECYCLE_ENABLED Enable lifecycle policy worker true true, false, 1, 0
S4_LIFECYCLE_INTERVAL_HOURS Lifecycle evaluation interval (hours) 24 1, 6, 24, 168
S4_LIFECYCLE_DRY_RUN Dry-run mode (log without deleting) false true, false, 1, 0
S4_COMPACTION_ENABLED Enable volume compaction worker true true, false, 1, 0
S4_COMPACTION_INTERVAL_HOURS Compaction check interval (hours) 1 1, 6, 12, 24
S4_COMPACTION_THRESHOLD Min fragmentation ratio to compact 0.3 0.10.9
S4_COMPACTION_DRY_RUN Analyze without compacting false true, false, 1, 0
S4_COMPACTION_FULL_TIME Daily full compaction time (HH:MM, local time) 02:00 03:30, "" (disable)
S4_MULTIPART_UPLOAD_TTL_HOURS TTL for abandoned multipart uploads (hours) 24 1, 48
S4_COMPACTION_MULTIPART_TTL_SECS Dev/testing only. Overrides multipart TTL for compactor in seconds None 1, 60
S4_METRICS_ENABLED Prometheus metrics true) false
S4_SELECT_ENABLED Enable/disable S3 Select SQL engine true false
S4_SELECT_MAX_MEMORY Per-query memory limit for SQL engine 256MB 512MB, 1GB
S4_SELECT_TIMEOUT SQL query timeout (seconds) 60 120
Federation
S4_MODE Operating mode: single, cluster, gateway single cluster
S4_CLUSTER_NAME Cluster name for network isolation default production
S4_NODE_ID Human-readable node name Auto node-1
S4_NODE_GRPC_ADDR gRPC address for inter-node communication 10.0.1.1:9100
S4_NODE_HTTP_ADDR HTTP address advertised to cluster 10.0.1.1:9000
S4_SEEDS Comma-separated seed gRPC addresses 10.0.1.1:9100,10.0.1.2:9100
S4_POOL_NAME Pool this node belongs to pool-1
S4_POOL_NODES Pool members (id:addr,...) node-1:10.0.1.1:9100,...
S4_REPLICATION_FACTOR Replication factor (N) 3 3
S4_WRITE_QUORUM Write quorum (W) 2 2
S4_READ_QUORUM Read quorum (R) 2 2
S4_GC_GRACE_DAYS Tombstone GC grace period (days) 7 7
S4_MAX_REJOIN_DOWNTIME_DAYS Max offline before full bootstrap 3 3
S4_ANTI_ENTROPY_INTERVAL_SECS Merkle tree exchange interval (s) 600 600
S4_SCRUBBER_FULL_SCAN_DAYS CRC32 scrub cycle (days) 30 30
S4_HINT_TTL_HOURS Hinted handoff TTL (hours) 3 3

Size format: Supports GB/G, MB/M, KB/K, or bytes (no suffix).

Example (HTTP):

export S4_ACCESS_KEY_ID=myaccesskey
export S4_SECRET_ACCESS_KEY=mysecretkey
export S4_DATA_DIR=/var/lib/s4
export S4_MAX_UPLOAD_SIZE=10GB

./target/release/s4-server

Using with AWS CLI

Configure AWS CLI to use S4:

aws configure set aws_access_key_id myaccesskey
aws configure set aws_secret_access_key mysecretkey

Basic operations:

# Create a bucket
aws --endpoint-url http://localhost:9000 s3 mb s3://mybucket

# Upload a file
aws --endpoint-url http://localhost:9000 s3 cp file.txt s3://mybucket/file.txt

# List objects
aws --endpoint-url http://localhost:9000 s3 ls s3://mybucket

# Download a file
aws --endpoint-url http://localhost:9000 s3 cp s3://mybucket/file.txt downloaded.txt

# Delete a file
aws --endpoint-url http://localhost:9000 s3 rm s3://mybucket/file.txt

# Delete a bucket
aws --endpoint-url http://localhost:9000 s3 rb s3://mybucket

Versioning

S4 supports S3-compatible object versioning to preserve, retrieve, and restore every version of every object.

# Enable versioning on bucket
aws s3api put-bucket-versioning \
  --bucket mybucket \
  --versioning-configuration Status=Enabled \
  --endpoint-url https://127.0.0.1:9000 \
  --no-verify-ssl

# Upload file (version 1)
echo "version 1" | aws s3api put-object \
  --bucket mybucket \
  --key file.txt \
  --body - \
  --endpoint-url https://127.0.0.1:9000 \
  --no-verify-ssl

# Upload again (version 2)
echo "version 2" | aws s3api put-object \
  --bucket mybucket \
  --key file.txt \
  --body - \
  --endpoint-url https://127.0.0.1:9000 \
  --no-verify-ssl

# List all versions
aws s3api list-object-versions \
  --bucket mybucket \
  --prefix file.txt \
  --endpoint-url https://127.0.0.1:9000 \
  --no-verify-ssl

# Get specific version
aws s3api get-object \
  --bucket mybucket \
  --key file.txt \
  --version-id "ff495d34-c292-4af4-9d10-e186272010ed" \
  first_version.txt \
  --endpoint-url https://127.0.0.1:9000 \
  --no-verify-ssl

# Delete object (creates delete marker)
aws s3api delete-object \
  --bucket mybucket \
  --key file.txt \
  --endpoint-url https://127.0.0.1:9000 \
  --no-verify-ssl

Lifecycle Policies

S4 supports automatic object expiration and cleanup based on lifecycle rules.

# Create lifecycle configuration file
cat > lifecycle.json <<'EOF'
{
  "Rules": [
    {
      "ID": "expire-logs",
      "Status": "Enabled",
      "Filter": {
        "Prefix": "logs/"
      },
      "Expiration": {
        "Days": 30
      }
    },
    {
      "ID": "cleanup-old-versions",
      "Status": "Enabled",
      "Filter": {
        "Prefix": ""
      },
      "NoncurrentVersionExpiration": {
        "NoncurrentDays": 90
      }
    }
  ]
}
EOF

# Set lifecycle configuration
aws s3api put-bucket-lifecycle-configuration \
  --bucket mybucket \
  --lifecycle-configuration file://lifecycle.json \
  --endpoint-url https://127.0.0.1:9000 \
  --no-verify-ssl

# Get lifecycle configuration
aws s3api get-bucket-lifecycle-configuration \
  --bucket mybucket \
  --endpoint-url https://127.0.0.1:9000 \
  --no-verify-ssl

# Delete lifecycle configuration
aws s3api delete-bucket-lifecycle \
  --bucket mybucket \
  --endpoint-url https://127.0.0.1:9000 \
  --no-verify-ssl

Lifecycle worker configuration:

# Enable/disable lifecycle worker (default: enabled)
export S4_LIFECYCLE_ENABLED=true

# Set evaluation interval in hours (default: 24)
export S4_LIFECYCLE_INTERVAL_HOURS=24

# Enable dry-run mode to test without deleting (default: false)
export S4_LIFECYCLE_DRY_RUN=true

IAM & Admin API

S4 includes a built-in IAM system with role-based access control. IAM is enabled when S4_ROOT_PASSWORD is set.

Roles:

  • Reader -- can list buckets/objects and download objects
  • Writer -- Reader permissions plus create/delete buckets and objects
  • SuperUser -- full admin access including user management

Starting with IAM enabled:

export S4_ROOT_PASSWORD=password12345
./target/release/s4-server

Admin API usage (curl):

# Login (get JWT token)
TOKEN=$(curl -s -k -X POST https://localhost:9000/api/admin/login \
  -H 'Content-Type: application/json' \
  -d '{"username":"root","password":"password12345"}' | jq -r '.token')

# List users
curl -s -k https://localhost:9000/api/admin/users \
  -H "Authorization: Bearer $TOKEN"

# Create a user
curl -s -k -X POST https://localhost:9000/api/admin/users \
  -H "Authorization: Bearer $TOKEN" \
  -H 'Content-Type: application/json' \
  -d '{"username":"alice","password":"alice123","role":"Writer"}'

# Generate S3 credentials for a user
curl -s -k -X POST https://localhost:9000/api/admin/users/<user-id>/credentials \
  -H "Authorization: Bearer $TOKEN"

# Update user (change role, password, or active status)
curl -s -k -X PUT https://localhost:9000/api/admin/users/<user-id> \
  -H "Authorization: Bearer $TOKEN" \
  -H 'Content-Type: application/json' \
  -d '{"role":"Reader"}'

# Delete S3 credentials
curl -s -k -X DELETE https://localhost:9000/api/admin/users/<user-id>/credentials \
  -H "Authorization: Bearer $TOKEN"

# Delete user
curl -s -k -X DELETE https://localhost:9000/api/admin/users/<user-id> \
  -H "Authorization: Bearer $TOKEN"

Using S3 with IAM credentials:

After generating S3 credentials via the Admin API, use them with AWS CLI:

aws configure set aws_access_key_id S4AK_xxxxxxxx
aws configure set aws_secret_access_key xxxxxxxx
aws --endpoint-url https://localhost:9000 --no-verify-ssl s3 ls

Legacy S4_ACCESS_KEY_ID / S4_SECRET_ACCESS_KEY environment credentials continue to work as a fallback with full (SuperUser) access.

S3 Select SQL

S4 includes a built-in SQL query engine powered by Apache DataFusion. Query your stored objects directly — no need to download them first.

Single-Object Query (S3 Select API):

# Upload a CSV file
aws --endpoint-url http://localhost:9000 s3 cp employees.csv s3://mybucket/employees.csv

# Query it with SQL (via curl — returns binary event stream)
curl -X POST "http://localhost:9000/mybucket/employees.csv?select&select-type=2" \
  -H "Content-Type: application/xml" \
  -d '<?xml version="1.0" encoding="UTF-8"?>
<SelectObjectContentRequest>
    <Expression>SELECT name, salary FROM s3object WHERE CAST(salary AS INT) > 100000</Expression>
    <ExpressionType>SQL</ExpressionType>
    <InputSerialization>
        <CSV><FileHeaderInfo>USE</FileHeaderInfo></CSV>
    </InputSerialization>
    <OutputSerialization>
        <CSV/>
    </OutputSerialization>
</SelectObjectContentRequest>'

Supported input formats: CSV, JSON (Lines/Document), Parquet. Output formats: CSV, JSON.

Multi-Object SQL Query (S4 Extended):

S4 extends S3 Select with multi-object queries using glob patterns:

# Upload multiple CSV files
aws --endpoint-url http://localhost:9000 s3 cp data1.csv s3://mybucket/logs/data1.csv
aws --endpoint-url http://localhost:9000 s3 cp data2.csv s3://mybucket/logs/data2.csv

# Query across all matching objects (JSON output)
curl -X POST "http://localhost:9000/mybucket?sql" \
  -H "Content-Type: application/json" \
  -d '{"sql": "SELECT * FROM '\''logs/*.csv'\'' WHERE status = '\''ERROR'\''", "format": "csv", "output": "json"}'

# Aggregation across files (CSV output)
curl -X POST "http://localhost:9000/mybucket?sql" \
  -H "Content-Type: application/json" \
  -d '{"sql": "SELECT COUNT(*) as total, AVG(CAST(value AS DOUBLE)) as avg_val FROM '\''logs/*.csv'\''", "format": "csv", "output": "csv"}'

Full SQL support includes WHERE, GROUP BY, ORDER BY, LIMIT, JOIN, window functions, CTEs, and aggregate functions.

Federation (Distributed Mode)

S4 supports leaderless quorum replication for high availability. A cluster is composed of server pools — immutable groups of nodes that replicate data among themselves.

Key properties:

  • Any node can serve any S3 request (no single leader)
  • Default quorum: N=3, W=2, R=2 — tolerates 1 node failure
  • SWIM gossip for failure detection, gRPC for data replication
  • Buckets are pinned to pools; horizontal scaling = adding new pools
  • HLC + LWW conflict resolution (deterministic, no coordination)
  • Anti-entropy via Merkle trees (background repair every 10 min)
  • Distributed tombstone GC with zombie resurrection protection
  • Bit rot detection and auto-healing from replicas
  • Rolling upgrades, graceful shutdown, admin API for cluster ops

Quick start (3-node cluster):

# Node 1
S4_MODE=cluster \
S4_NODE_ID=node-1 \
S4_NODE_GRPC_ADDR=10.0.1.1:9100 \
S4_NODE_HTTP_ADDR=10.0.1.1:9000 \
S4_SEEDS=10.0.1.1:9100,10.0.1.2:9100,10.0.1.3:9100 \
S4_POOL_NAME=pool-1 \
S4_POOL_NODES=node-1:10.0.1.1:9100,node-2:10.0.1.2:9100,node-3:10.0.1.3:9100 \
S4_ACCESS_KEY_ID=myaccesskey \
S4_SECRET_ACCESS_KEY=mysecretkey \
./s4-server

# Node 2 and 3: same config, different S4_NODE_ID and S4_NODE_*_ADDR

For detailed architecture, Docker Compose examples, and configuration reference, see docs/04-features/federation.md.

CORS Configuration

S4 supports S3-compatible CORS (Cross-Origin Resource Sharing) for browser-based access.

# Set CORS configuration
curl -X PUT "http://localhost:9000/mybucket?cors" \
  -H "Content-Type: application/xml" \
  -d '<?xml version="1.0" encoding="UTF-8"?>
<CORSConfiguration>
  <CORSRule>
    <AllowedOrigin>https://example.com</AllowedOrigin>
    <AllowedMethod>GET</AllowedMethod>
    <AllowedMethod>PUT</AllowedMethod>
    <AllowedHeader>*</AllowedHeader>
    <MaxAgeSeconds>3600</MaxAgeSeconds>
  </CORSRule>
</CORSConfiguration>'

# Get CORS configuration
curl "http://localhost:9000/mybucket?cors"

# Delete CORS configuration
curl -X DELETE "http://localhost:9000/mybucket?cors"

TLS/HTTPS Configuration

S4 supports TLS for encrypted connections. TLS is disabled by default and enabled automatically when both certificate and key paths are provided.

Generating self-signed certificates (for development):

# Generate self-signed certificate and key
openssl req -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem -days 365 -nodes \
  -subj "/CN=localhost"

Running with TLS:

export S4_TLS_CERT=/path/to/cert.pem
export S4_TLS_KEY=/path/to/key.pem
./target/release/s4-server

Using with AWS CLI (HTTPS):

# For self-signed certificates, use --no-verify-ssl
aws --endpoint-url https://localhost:9000 --no-verify-ssl s3 ls

# For production with valid certificates
aws --endpoint-url https://s4.example.com:9000 s3 ls

Certificate requirements:

  • PEM-encoded X.509 certificate
  • PEM-encoded private key (RSA, ECDSA, or Ed25519)
  • Certificate chain is supported (include intermediate certs in cert.pem)

Configuration File (Optional)

You can also use a config.toml file:

[server]
bind = "0.0.0.0:9000"

[storage]
data_path = "/var/lib/s4/volumes"
metadata_path = "/var/lib/s4/metadata_db"

[tuning]
volume_size_mb = 1024    # 1GB
strict_sync = true

Documentation

Development

See CONTRIBUTING.md for development setup and guidelines.

License

  • Community Edition (CE): Apache License 2.0 — all code outside ee/ directory
  • Enterprise Edition (EE): Elastic License 2.0 — code inside ee/ directory

EE source code is available for audit. Using EE features in production requires a valid license key. See ee/README.md for details.

Status

🚧 Early Development - S4 is currently in active development. Not ready for production use.

Pinned Loading

  1. s4core s4core Public

    🚀3x faster than MinIO and RustFS. S4Core is an open-source, Rust-based S3-compatible server. Say goodbye to inode exhaustion and hello to atomic operations and smart deduplication

    Rust 369 12