S4Core s4core

💫 S4 - Modern S3-Compatible Object Storage

S4 is a high-performance, S3-compatible object storage server written in Rust. It solves the inode exhaustion problem common with traditional file-based storage systems and provides advanced features like atomic directory operations and content-addressable deduplication.

Demo Console: s4console · Login: root / password12345 · Resets every 10 min

Demo API: s4core · Access Key ID / Secret Access Key: my-secret-key_id / my-secret-access-key · Resets every 10 min

Features

S3 API Compatible: Full compatibility with AWS S3 API (AWS CLI, boto3, etc.)
Inode Problem Solved: Append-only log storage eliminates inode exhaustion
Content Deduplication: Automatic deduplication saves 30-50% storage space
Object Versioning: S3-compatible versioning with delete markers
Lifecycle Policies: Automatic object expiration and cleanup of old versions
Atomic Operations: Rename directories with millions of files in milliseconds
Strict Consistency: Data is guaranteed to be written before returning success
IAM & Admin API: Role-based access control (Reader, Writer, SuperUser) with JWT authentication
S3 Select SQL: Query CSV/JSON/Parquet objects with full SQL (powered by Apache DataFusion)
Multi-Object SQL: Extended S3 Select with glob patterns for querying across multiple objects
Federation: Leaderless quorum replication for high availability (N=3, W=2, R=2)
High Performance: Optimized for single-node and distributed deployments

Architecture

S4 uses a Bitcask-style storage approach:

All objects: Stored in append-only volume files (~1GB each)
Metadata: Stored in fjall (LSM-tree, MVCC, LZ4 compression) with separate keyspaces

This approach ensures:

Minimal inode usage (1 billion objects = ~1000 files)
Maximum write performance (sequential writes)
Atomic metadata operations (fjall cross-keyspace batches)
Fast recovery (metadata in ACID database + crash-safe journal)

Quick Start

Prerequisites

Rust 1.70 or later
Linux (recommended) or macOS

Building from Source

# Clone the repository
git clone https://github.com/org/s4.git
cd s4

# Build the project
cargo build --release

# Run the server
./target/release/s4-server

Docker

S4 provides official Docker images for easy deployment in two editions:

Image Tag	Edition	Description
`s4core/s4core:latest`	CE	Community Edition (recommended default)
`s4core/s4core:v0.0.8`	CE	CE with version tag
`s4core/s4core:ce`	CE	Explicit CE alias (same as `latest`)
`s4core/s4core:ce-v0.0.8`	CE	CE with version (explicit)
`s4core/s4core:ee`	EE	Enterprise Edition (requires license key)
`s4core/s4core:ee-v0.0.8`	EE	EE with version

CE is fully functional: single node or up to 3-node cluster with quorum replication. EE unlocks unlimited pools/nodes and operational features — see ee/README.md.

Using `docker run`

# Run S4 server (basic — Community Edition)
docker run -d \
  --name s4core \
  -p 9000:9000 \
  -v s4-data:/data \
  -e S4_BIND=0.0.0.0:9000 \
  s4core/s4core:latest

# Run with custom credentials
docker run -d \
  --name s4core \
  -p 9000:9000 \
  -v s4-data:/data \
  -e S4_BIND=0.0.0.0:9000 \
  -e S4_ACCESS_KEY_ID=myaccesskey \
  -e S4_SECRET_ACCESS_KEY=mysecretkey \
  s4core/s4core:latest

# Run with IAM enabled
docker run -d \
  --name s4core \
  -p 9000:9000 \
  -v s4-data:/data \
  -e S4_BIND=0.0.0.0:9000 \
  -e S4_ROOT_PASSWORD=password12345 \
  s4core/s4core:latest

# Run Enterprise Edition (with license key)
docker run -d \
  --name s4core \
  -p 9000:9000 \
  -v s4-data:/data \
  -e S4_BIND=0.0.0.0:9000 \
  -e S4_LICENSE_KEY=your-license-key-here \
  s4core/s4core:ee

# Build the image locally
docker build -t s4-server .                              # CE
docker build --build-arg EDITION=ee -t s4-server-ee .    # EE

Using Docker Compose

The project includes a docker-compose.yml that runs S4 server together with the web admin console.

# Run full stack (server + web console)
docker compose up --build

# Run in background
docker compose up -d --build

# Run only the server
docker compose up s4-server --build

# With custom environment variables
S4_ROOT_PASSWORD=password12345 docker compose up --build

After startup:

S4 API: http://localhost:9000
Web Console: http://localhost:3000 (login with root credentials)

docker-compose.yml overview:

services:
  s4core:
    build: .
    ports:
      - "9000:9000"
    volumes:
      - s4-data:/data
    environment:
      - S4_BIND=0.0.0.0:9000
      - S4_ROOT_PASSWORD=${S4_ROOT_PASSWORD:-}
      - S4_ACCESS_KEY_ID=${S4_ACCESS_KEY_ID:-}
      - S4_SECRET_ACCESS_KEY=${S4_SECRET_ACCESS_KEY:-}

  s4-console:
    image: s4core/s4console:latest
    ports:
      - "3000:3000"
    environment:
      - S4_BACKEND_URL=http://s4-server:9000
    depends_on:
      - s4core

For web console-only development, see frontend/README.md.

Environment Variables

S4 is configured through environment variables:

Variable	Description	Default	Example
`S4_BIND`	S4 BIND host	`127.0.0.1:9000`	`0.0.0.0:9000`
`S4_ROOT_USERNAME`	Root admin username	`root`	`admin`
`S4_ROOT_PASSWORD`	Root admin password (enables IAM)	None (IAM disabled)	`password12345`
`S4_JWT_SECRET`	Secret key for signing JWT tokens	Auto-generated at startup (dev mode only)	`256-bit-crypto-random-string-like-this-1234567890ABCDEF`
`S4_ACCESS_KEY_ID`	Access key for S3 authentication	Auto-generated dev key	`myaccesskey`
`S4_SECRET_ACCESS_KEY`	Secret key for S3 authentication	Auto-generated dev key	`mysecretkey`
`S4_DATA_DIR`	Base directory for storage	System temp dir	`/var/lib/s4`
`S4_MAX_UPLOAD_SIZE`	Maximum upload size per request	`5GB`	`10GB`, `100MB`, `1024KB`
`S4_TLS_CERT`	Path to TLS certificate (PEM format)	None (HTTP mode)	`/etc/ssl/certs/s4.pem`
`S4_TLS_KEY`	Path to TLS private key (PEM format)	None (HTTP mode)	`/etc/ssl/private/s4-key.pem`
`S4_LIFECYCLE_ENABLED`	Enable lifecycle policy worker	`true`	`true`, `false`, `1`, `0`
`S4_LIFECYCLE_INTERVAL_HOURS`	Lifecycle evaluation interval (hours)	`24`	`1`, `6`, `24`, `168`
`S4_LIFECYCLE_DRY_RUN`	Dry-run mode (log without deleting)	`false`	`true`, `false`, `1`, `0`
`S4_COMPACTION_ENABLED`	Enable volume compaction worker	`true`	`true`, `false`, `1`, `0`
`S4_COMPACTION_INTERVAL_HOURS`	Compaction check interval (hours)	`1`	`1`, `6`, `12`, `24`
`S4_COMPACTION_THRESHOLD`	Min fragmentation ratio to compact	`0.3`	`0.1`–`0.9`
`S4_COMPACTION_DRY_RUN`	Analyze without compacting	`false`	`true`, `false`, `1`, `0`
`S4_COMPACTION_FULL_TIME`	Daily full compaction time (HH:MM, local time)	`02:00`	`03:30`, `""` (disable)
`S4_MULTIPART_UPLOAD_TTL_HOURS`	TTL for abandoned multipart uploads (hours)	`24`	`1`, `48`
`S4_COMPACTION_MULTIPART_TTL_SECS`	Dev/testing only. Overrides multipart TTL for compactor in seconds	None	`1`, `60`
`S4_METRICS_ENABLED`	Prometheus metrics	true)	false
`S4_SELECT_ENABLED`	Enable/disable S3 Select SQL engine	`true`	`false`
`S4_SELECT_MAX_MEMORY`	Per-query memory limit for SQL engine	`256MB`	`512MB`, `1GB`
`S4_SELECT_TIMEOUT`	SQL query timeout (seconds)	`60`	`120`
Federation
`S4_MODE`	Operating mode: `single`, `cluster`, `gateway`	`single`	`cluster`
`S4_CLUSTER_NAME`	Cluster name for network isolation	`default`	`production`
`S4_NODE_ID`	Human-readable node name	Auto	`node-1`
`S4_NODE_GRPC_ADDR`	gRPC address for inter-node communication	—	`10.0.1.1:9100`
`S4_NODE_HTTP_ADDR`	HTTP address advertised to cluster	—	`10.0.1.1:9000`
`S4_SEEDS`	Comma-separated seed gRPC addresses	—	`10.0.1.1:9100,10.0.1.2:9100`
`S4_POOL_NAME`	Pool this node belongs to	—	`pool-1`
`S4_POOL_NODES`	Pool members (`id:addr,...`)	—	`node-1:10.0.1.1:9100,...`
`S4_REPLICATION_FACTOR`	Replication factor (N)	`3`	`3`
`S4_WRITE_QUORUM`	Write quorum (W)	`2`	`2`
`S4_READ_QUORUM`	Read quorum (R)	`2`	`2`
`S4_GC_GRACE_DAYS`	Tombstone GC grace period (days)	`7`	`7`
`S4_MAX_REJOIN_DOWNTIME_DAYS`	Max offline before full bootstrap	`3`	`3`
`S4_ANTI_ENTROPY_INTERVAL_SECS`	Merkle tree exchange interval (s)	`600`	`600`
`S4_SCRUBBER_FULL_SCAN_DAYS`	CRC32 scrub cycle (days)	`30`	`30`
`S4_HINT_TTL_HOURS`	Hinted handoff TTL (hours)	`3`	`3`

Size format: Supports GB/G, MB/M, KB/K, or bytes (no suffix).

Example (HTTP):

export S4_ACCESS_KEY_ID=myaccesskey
export S4_SECRET_ACCESS_KEY=mysecretkey
export S4_DATA_DIR=/var/lib/s4
export S4_MAX_UPLOAD_SIZE=10GB

./target/release/s4-server

Using with AWS CLI

Configure AWS CLI to use S4:

aws configure set aws_access_key_id myaccesskey
aws configure set aws_secret_access_key mysecretkey

Basic operations:

# Create a bucket
aws --endpoint-url http://localhost:9000 s3 mb s3://mybucket

# Upload a file
aws --endpoint-url http://localhost:9000 s3 cp file.txt s3://mybucket/file.txt

# List objects
aws --endpoint-url http://localhost:9000 s3 ls s3://mybucket

# Download a file
aws --endpoint-url http://localhost:9000 s3 cp s3://mybucket/file.txt downloaded.txt

# Delete a file
aws --endpoint-url http://localhost:9000 s3 rm s3://mybucket/file.txt

# Delete a bucket
aws --endpoint-url http://localhost:9000 s3 rb s3://mybucket

Versioning

S4 supports S3-compatible object versioning to preserve, retrieve, and restore every version of every object.

# Enable versioning on bucket
aws s3api put-bucket-versioning \
  --bucket mybucket \
  --versioning-configuration Status=Enabled \
  --endpoint-url https://127.0.0.1:9000 \
  --no-verify-ssl

# Upload file (version 1)
echo "version 1" | aws s3api put-object \
  --bucket mybucket \
  --key file.txt \
  --body - \
  --endpoint-url https://127.0.0.1:9000 \
  --no-verify-ssl

# Upload again (version 2)
echo "version 2" | aws s3api put-object \
  --bucket mybucket \
  --key file.txt \
  --body - \
  --endpoint-url https://127.0.0.1:9000 \
  --no-verify-ssl

# List all versions
aws s3api list-object-versions \
  --bucket mybucket \
  --prefix file.txt \
  --endpoint-url https://127.0.0.1:9000 \
  --no-verify-ssl

# Get specific version
aws s3api get-object \
  --bucket mybucket \
  --key file.txt \
  --version-id "ff495d34-c292-4af4-9d10-e186272010ed" \
  first_version.txt \
  --endpoint-url https://127.0.0.1:9000 \
  --no-verify-ssl

# Delete object (creates delete marker)
aws s3api delete-object \
  --bucket mybucket \
  --key file.txt \
  --endpoint-url https://127.0.0.1:9000 \
  --no-verify-ssl

Lifecycle Policies

S4 supports automatic object expiration and cleanup based on lifecycle rules.

# Create lifecycle configuration file
cat > lifecycle.json <<'EOF'
{
  "Rules": [
    {
      "ID": "expire-logs",
      "Status": "Enabled",
      "Filter": {
        "Prefix": "logs/"
      },
      "Expiration": {
        "Days": 30
      }
    },
    {
      "ID": "cleanup-old-versions",
      "Status": "Enabled",
      "Filter": {
        "Prefix": ""
      },
      "NoncurrentVersionExpiration": {
        "NoncurrentDays": 90
      }
    }
  ]
}
EOF

# Set lifecycle configuration
aws s3api put-bucket-lifecycle-configuration \
  --bucket mybucket \
  --lifecycle-configuration file://lifecycle.json \
  --endpoint-url https://127.0.0.1:9000 \
  --no-verify-ssl

# Get lifecycle configuration
aws s3api get-bucket-lifecycle-configuration \
  --bucket mybucket \
  --endpoint-url https://127.0.0.1:9000 \
  --no-verify-ssl

# Delete lifecycle configuration
aws s3api delete-bucket-lifecycle \
  --bucket mybucket \
  --endpoint-url https://127.0.0.1:9000 \
  --no-verify-ssl

Lifecycle worker configuration:

# Enable/disable lifecycle worker (default: enabled)
export S4_LIFECYCLE_ENABLED=true

# Set evaluation interval in hours (default: 24)
export S4_LIFECYCLE_INTERVAL_HOURS=24

# Enable dry-run mode to test without deleting (default: false)
export S4_LIFECYCLE_DRY_RUN=true

IAM & Admin API

S4 includes a built-in IAM system with role-based access control. IAM is enabled when S4_ROOT_PASSWORD is set.

Roles:

Reader -- can list buckets/objects and download objects
Writer -- Reader permissions plus create/delete buckets and objects
SuperUser -- full admin access including user management

Starting with IAM enabled:

export S4_ROOT_PASSWORD=password12345
./target/release/s4-server

Admin API usage (curl):

# Login (get JWT token)
TOKEN=$(curl -s -k -X POST https://localhost:9000/api/admin/login \
  -H 'Content-Type: application/json' \
  -d '{"username":"root","password":"password12345"}' | jq -r '.token')

# List users
curl -s -k https://localhost:9000/api/admin/users \
  -H "Authorization: Bearer $TOKEN"

# Create a user
curl -s -k -X POST https://localhost:9000/api/admin/users \
  -H "Authorization: Bearer $TOKEN" \
  -H 'Content-Type: application/json' \
  -d '{"username":"alice","password":"alice123","role":"Writer"}'

# Generate S3 credentials for a user
curl -s -k -X POST https://localhost:9000/api/admin/users/<user-id>/credentials \
  -H "Authorization: Bearer $TOKEN"

# Update user (change role, password, or active status)
curl -s -k -X PUT https://localhost:9000/api/admin/users/<user-id> \
  -H "Authorization: Bearer $TOKEN" \
  -H 'Content-Type: application/json' \
  -d '{"role":"Reader"}'

# Delete S3 credentials
curl -s -k -X DELETE https://localhost:9000/api/admin/users/<user-id>/credentials \
  -H "Authorization: Bearer $TOKEN"

# Delete user
curl -s -k -X DELETE https://localhost:9000/api/admin/users/<user-id> \
  -H "Authorization: Bearer $TOKEN"

Using S3 with IAM credentials:

After generating S3 credentials via the Admin API, use them with AWS CLI:

aws configure set aws_access_key_id S4AK_xxxxxxxx
aws configure set aws_secret_access_key xxxxxxxx
aws --endpoint-url https://localhost:9000 --no-verify-ssl s3 ls

Legacy S4_ACCESS_KEY_ID / S4_SECRET_ACCESS_KEY environment credentials continue to work as a fallback with full (SuperUser) access.

S3 Select SQL

S4 includes a built-in SQL query engine powered by Apache DataFusion. Query your stored objects directly — no need to download them first.

Single-Object Query (S3 Select API):

# Upload a CSV file
aws --endpoint-url http://localhost:9000 s3 cp employees.csv s3://mybucket/employees.csv

# Query it with SQL (via curl — returns binary event stream)
curl -X POST "http://localhost:9000/mybucket/employees.csv?select&select-type=2" \
  -H "Content-Type: application/xml" \
  -d '<?xml version="1.0" encoding="UTF-8"?>
<SelectObjectContentRequest>
    <Expression>SELECT name, salary FROM s3object WHERE CAST(salary AS INT) > 100000</Expression>
    <ExpressionType>SQL</ExpressionType>
    <InputSerialization>
        <CSV><FileHeaderInfo>USE</FileHeaderInfo></CSV>
    </InputSerialization>
    <OutputSerialization>
        <CSV/>
    </OutputSerialization>
</SelectObjectContentRequest>'

Supported input formats: CSV, JSON (Lines/Document), Parquet. Output formats: CSV, JSON.

Multi-Object SQL Query (S4 Extended):

S4 extends S3 Select with multi-object queries using glob patterns:

# Upload multiple CSV files
aws --endpoint-url http://localhost:9000 s3 cp data1.csv s3://mybucket/logs/data1.csv
aws --endpoint-url http://localhost:9000 s3 cp data2.csv s3://mybucket/logs/data2.csv

# Query across all matching objects (JSON output)
curl -X POST "http://localhost:9000/mybucket?sql" \
  -H "Content-Type: application/json" \
  -d '{"sql": "SELECT * FROM '\''logs/*.csv'\'' WHERE status = '\''ERROR'\''", "format": "csv", "output": "json"}'

# Aggregation across files (CSV output)
curl -X POST "http://localhost:9000/mybucket?sql" \
  -H "Content-Type: application/json" \
  -d '{"sql": "SELECT COUNT(*) as total, AVG(CAST(value AS DOUBLE)) as avg_val FROM '\''logs/*.csv'\''", "format": "csv", "output": "csv"}'

Full SQL support includes WHERE, GROUP BY, ORDER BY, LIMIT, JOIN, window functions, CTEs, and aggregate functions.

Federation (Distributed Mode)

S4 supports leaderless quorum replication for high availability. A cluster is composed of server pools — immutable groups of nodes that replicate data among themselves.

Key properties:

Any node can serve any S3 request (no single leader)
Default quorum: N=3, W=2, R=2 — tolerates 1 node failure
SWIM gossip for failure detection, gRPC for data replication
Buckets are pinned to pools; horizontal scaling = adding new pools
HLC + LWW conflict resolution (deterministic, no coordination)
Anti-entropy via Merkle trees (background repair every 10 min)
Distributed tombstone GC with zombie resurrection protection
Bit rot detection and auto-healing from replicas
Rolling upgrades, graceful shutdown, admin API for cluster ops

Quick start (3-node cluster):

# Node 1
S4_MODE=cluster \
S4_NODE_ID=node-1 \
S4_NODE_GRPC_ADDR=10.0.1.1:9100 \
S4_NODE_HTTP_ADDR=10.0.1.1:9000 \
S4_SEEDS=10.0.1.1:9100,10.0.1.2:9100,10.0.1.3:9100 \
S4_POOL_NAME=pool-1 \
S4_POOL_NODES=node-1:10.0.1.1:9100,node-2:10.0.1.2:9100,node-3:10.0.1.3:9100 \
S4_ACCESS_KEY_ID=myaccesskey \
S4_SECRET_ACCESS_KEY=mysecretkey \
./s4-server

# Node 2 and 3: same config, different S4_NODE_ID and S4_NODE_*_ADDR

For detailed architecture, Docker Compose examples, and configuration reference, see docs/04-features/federation.md.

CORS Configuration

S4 supports S3-compatible CORS (Cross-Origin Resource Sharing) for browser-based access.

# Set CORS configuration
curl -X PUT "http://localhost:9000/mybucket?cors" \
  -H "Content-Type: application/xml" \
  -d '<?xml version="1.0" encoding="UTF-8"?>
<CORSConfiguration>
  <CORSRule>
    <AllowedOrigin>https://example.com</AllowedOrigin>
    <AllowedMethod>GET</AllowedMethod>
    <AllowedMethod>PUT</AllowedMethod>
    <AllowedHeader>*</AllowedHeader>
    <MaxAgeSeconds>3600</MaxAgeSeconds>
  </CORSRule>
</CORSConfiguration>'

# Get CORS configuration
curl "http://localhost:9000/mybucket?cors"

# Delete CORS configuration
curl -X DELETE "http://localhost:9000/mybucket?cors"

TLS/HTTPS Configuration

S4 supports TLS for encrypted connections. TLS is disabled by default and enabled automatically when both certificate and key paths are provided.

Generating self-signed certificates (for development):

# Generate self-signed certificate and key
openssl req -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem -days 365 -nodes \
  -subj "/CN=localhost"

Running with TLS:

export S4_TLS_CERT=/path/to/cert.pem
export S4_TLS_KEY=/path/to/key.pem
./target/release/s4-server

Using with AWS CLI (HTTPS):

# For self-signed certificates, use --no-verify-ssl
aws --endpoint-url https://localhost:9000 --no-verify-ssl s3 ls

# For production with valid certificates
aws --endpoint-url https://s4.example.com:9000 s3 ls

Certificate requirements:

PEM-encoded X.509 certificate
PEM-encoded private key (RSA, ECDSA, or Ed25519)
Certificate chain is supported (include intermediate certs in cert.pem)

Configuration File (Optional)

You can also use a config.toml file:

[server]
bind = "0.0.0.0:9000"

[storage]
data_path = "/var/lib/s4/volumes"
metadata_path = "/var/lib/s4/metadata_db"

[tuning]
volume_size_mb = 1024    # 1GB
strict_sync = true

Documentation

Architecture Guide - Detailed architecture documentation
Contributing Guide - How to contribute to S4
API Documentation - API reference documentation

Development

See CONTRIBUTING.md for development setup and guidelines.

License

Community Edition (CE): Apache License 2.0 — all code outside ee/ directory
Enterprise Edition (EE): Elastic License 2.0 — code inside ee/ directory

EE source code is available for audit. Using EE features in production requires a valid license key. See ee/README.md for details.

Status

🚧 Early Development - S4 is currently in active development. Not ready for production use.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

S4Core s4core

Achievements

Achievements

Block or report s4core

💫 S4 - Modern S3-Compatible Object Storage

Features

Architecture

Quick Start

Prerequisites

Building from Source

Docker

Using `docker run`

Using Docker Compose

Environment Variables

Using with AWS CLI

Versioning

Lifecycle Policies

IAM & Admin API

S3 Select SQL

Federation (Distributed Mode)

CORS Configuration

TLS/HTTPS Configuration

Configuration File (Optional)

Documentation

Development

License

Status

Pinned Loading

Uh oh!

S4Core s4core

Achievements

Achievements

💫 S4 - Modern S3-Compatible Object Storage

Features

Architecture

Quick Start

Prerequisites

Building from Source

Docker

Using docker run

Using Docker Compose

Environment Variables

Using with AWS CLI

Versioning

Lifecycle Policies

IAM & Admin API

S3 Select SQL

Federation (Distributed Mode)

CORS Configuration

TLS/HTTPS Configuration

Configuration File (Optional)

Documentation

Development

License

Status

Pinned Loading

Uh oh!

Using `docker run`