S4 is a high-performance, S3-compatible object storage server written in Rust. It solves the inode exhaustion problem common with traditional file-based storage systems and provides advanced features like atomic directory operations and content-addressable deduplication.
Demo Console: s4console Β· Login:
root/password12345Β· Resets every 10 min
Demo API: s4core Β· Access Key ID / Secret Access Key:
my-secret-key_id/my-secret-access-keyΒ· Resets every 10 min
- S3 API Compatible: Full compatibility with AWS S3 API (AWS CLI, boto3, etc.)
- Inode Problem Solved: Append-only log storage eliminates inode exhaustion
- Content Deduplication: Automatic deduplication saves 30-50% storage space
- Object Versioning: S3-compatible versioning with delete markers
- Lifecycle Policies: Automatic object expiration and cleanup of old versions
- Atomic Operations: Rename directories with millions of files in milliseconds
- Strict Consistency: Data is guaranteed to be written before returning success
- IAM & Admin API: Role-based access control (Reader, Writer, SuperUser) with JWT authentication
- S3 Select SQL: Query CSV/JSON/Parquet objects with full SQL (powered by Apache DataFusion)
- Multi-Object SQL: Extended S3 Select with glob patterns for querying across multiple objects
- Federation: Leaderless quorum replication for high availability (N=3, W=2, R=2)
- High Performance: Optimized for single-node and distributed deployments
S4 uses a Bitcask-style storage approach:
- All objects: Stored in append-only volume files (~1GB each)
- Metadata: Stored in fjall (LSM-tree, MVCC, LZ4 compression) with separate keyspaces
This approach ensures:
- Minimal inode usage (1 billion objects = ~1000 files)
- Maximum write performance (sequential writes)
- Atomic metadata operations (fjall cross-keyspace batches)
- Fast recovery (metadata in ACID database + crash-safe journal)
- Rust 1.70 or later
- Linux (recommended) or macOS
# Clone the repository
git clone https://github.com/org/s4.git
cd s4
# Build the project
cargo build --release
# Run the server
./target/release/s4-serverS4 provides official Docker images for easy deployment in two editions:
| Image Tag | Edition | Description |
|---|---|---|
s4core/s4core:latest |
CE | Community Edition (recommended default) |
s4core/s4core:v0.0.8 |
CE | CE with version tag |
s4core/s4core:ce |
CE | Explicit CE alias (same as latest) |
s4core/s4core:ce-v0.0.8 |
CE | CE with version (explicit) |
s4core/s4core:ee |
EE | Enterprise Edition (requires license key) |
s4core/s4core:ee-v0.0.8 |
EE | EE with version |
CE is fully functional: single node or up to 3-node cluster with quorum replication. EE unlocks unlimited pools/nodes and operational features β see ee/README.md.
# Run S4 server (basic β Community Edition)
docker run -d \
--name s4core \
-p 9000:9000 \
-v s4-data:/data \
-e S4_BIND=0.0.0.0:9000 \
s4core/s4core:latest
# Run with custom credentials
docker run -d \
--name s4core \
-p 9000:9000 \
-v s4-data:/data \
-e S4_BIND=0.0.0.0:9000 \
-e S4_ACCESS_KEY_ID=myaccesskey \
-e S4_SECRET_ACCESS_KEY=mysecretkey \
s4core/s4core:latest
# Run with IAM enabled
docker run -d \
--name s4core \
-p 9000:9000 \
-v s4-data:/data \
-e S4_BIND=0.0.0.0:9000 \
-e S4_ROOT_PASSWORD=password12345 \
s4core/s4core:latest
# Run Enterprise Edition (with license key)
docker run -d \
--name s4core \
-p 9000:9000 \
-v s4-data:/data \
-e S4_BIND=0.0.0.0:9000 \
-e S4_LICENSE_KEY=your-license-key-here \
s4core/s4core:ee
# Build the image locally
docker build -t s4-server . # CE
docker build --build-arg EDITION=ee -t s4-server-ee . # EEThe project includes a docker-compose.yml that runs S4 server together with the web admin console.
# Run full stack (server + web console)
docker compose up --build
# Run in background
docker compose up -d --build
# Run only the server
docker compose up s4-server --build
# With custom environment variables
S4_ROOT_PASSWORD=password12345 docker compose up --buildAfter startup:
- S4 API: http://localhost:9000
- Web Console: http://localhost:3000 (login with root credentials)
docker-compose.yml overview:
services:
s4core:
build: .
ports:
- "9000:9000"
volumes:
- s4-data:/data
environment:
- S4_BIND=0.0.0.0:9000
- S4_ROOT_PASSWORD=${S4_ROOT_PASSWORD:-}
- S4_ACCESS_KEY_ID=${S4_ACCESS_KEY_ID:-}
- S4_SECRET_ACCESS_KEY=${S4_SECRET_ACCESS_KEY:-}
s4-console:
image: s4core/s4console:latest
ports:
- "3000:3000"
environment:
- S4_BACKEND_URL=http://s4-server:9000
depends_on:
- s4coreFor web console-only development, see frontend/README.md.
S4 is configured through environment variables:
| Variable | Description | Default | Example |
|---|---|---|---|
S4_BIND |
S4 BIND host | 127.0.0.1:9000 |
0.0.0.0:9000 |
S4_ROOT_USERNAME |
Root admin username | root |
admin |
S4_ROOT_PASSWORD |
Root admin password (enables IAM) | None (IAM disabled) | password12345 |
S4_JWT_SECRET |
Secret key for signing JWT tokens | Auto-generated at startup (dev mode only) | 256-bit-crypto-random-string-like-this-1234567890ABCDEF |
S4_ACCESS_KEY_ID |
Access key for S3 authentication | Auto-generated dev key | myaccesskey |
S4_SECRET_ACCESS_KEY |
Secret key for S3 authentication | Auto-generated dev key | mysecretkey |
S4_DATA_DIR |
Base directory for storage | System temp dir | /var/lib/s4 |
S4_MAX_UPLOAD_SIZE |
Maximum upload size per request | 5GB |
10GB, 100MB, 1024KB |
S4_TLS_CERT |
Path to TLS certificate (PEM format) | None (HTTP mode) | /etc/ssl/certs/s4.pem |
S4_TLS_KEY |
Path to TLS private key (PEM format) | None (HTTP mode) | /etc/ssl/private/s4-key.pem |
S4_LIFECYCLE_ENABLED |
Enable lifecycle policy worker | true |
true, false, 1, 0 |
S4_LIFECYCLE_INTERVAL_HOURS |
Lifecycle evaluation interval (hours) | 24 |
1, 6, 24, 168 |
S4_LIFECYCLE_DRY_RUN |
Dry-run mode (log without deleting) | false |
true, false, 1, 0 |
S4_COMPACTION_ENABLED |
Enable volume compaction worker | true |
true, false, 1, 0 |
S4_COMPACTION_INTERVAL_HOURS |
Compaction check interval (hours) | 1 |
1, 6, 12, 24 |
S4_COMPACTION_THRESHOLD |
Min fragmentation ratio to compact | 0.3 |
0.1β0.9 |
S4_COMPACTION_DRY_RUN |
Analyze without compacting | false |
true, false, 1, 0 |
S4_COMPACTION_FULL_TIME |
Daily full compaction time (HH:MM, local time) | 02:00 |
03:30, "" (disable) |
S4_MULTIPART_UPLOAD_TTL_HOURS |
TTL for abandoned multipart uploads (hours) | 24 |
1, 48 |
S4_COMPACTION_MULTIPART_TTL_SECS |
Dev/testing only. Overrides multipart TTL for compactor in seconds | None | 1, 60 |
S4_METRICS_ENABLED |
Prometheus metrics | true) | false |
S4_SELECT_ENABLED |
Enable/disable S3 Select SQL engine | true |
false |
S4_SELECT_MAX_MEMORY |
Per-query memory limit for SQL engine | 256MB |
512MB, 1GB |
S4_SELECT_TIMEOUT |
SQL query timeout (seconds) | 60 |
120 |
| Federation | |||
S4_MODE |
Operating mode: single, cluster, gateway |
single |
cluster |
S4_CLUSTER_NAME |
Cluster name for network isolation | default |
production |
S4_NODE_ID |
Human-readable node name | Auto | node-1 |
S4_NODE_GRPC_ADDR |
gRPC address for inter-node communication | β | 10.0.1.1:9100 |
S4_NODE_HTTP_ADDR |
HTTP address advertised to cluster | β | 10.0.1.1:9000 |
S4_SEEDS |
Comma-separated seed gRPC addresses | β | 10.0.1.1:9100,10.0.1.2:9100 |
S4_POOL_NAME |
Pool this node belongs to | β | pool-1 |
S4_POOL_NODES |
Pool members (id:addr,...) |
β | node-1:10.0.1.1:9100,... |
S4_REPLICATION_FACTOR |
Replication factor (N) | 3 |
3 |
S4_WRITE_QUORUM |
Write quorum (W) | 2 |
2 |
S4_READ_QUORUM |
Read quorum (R) | 2 |
2 |
S4_GC_GRACE_DAYS |
Tombstone GC grace period (days) | 7 |
7 |
S4_MAX_REJOIN_DOWNTIME_DAYS |
Max offline before full bootstrap | 3 |
3 |
S4_ANTI_ENTROPY_INTERVAL_SECS |
Merkle tree exchange interval (s) | 600 |
600 |
S4_SCRUBBER_FULL_SCAN_DAYS |
CRC32 scrub cycle (days) | 30 |
30 |
S4_HINT_TTL_HOURS |
Hinted handoff TTL (hours) | 3 |
3 |
Size format: Supports GB/G, MB/M, KB/K, or bytes (no suffix).
Example (HTTP):
export S4_ACCESS_KEY_ID=myaccesskey
export S4_SECRET_ACCESS_KEY=mysecretkey
export S4_DATA_DIR=/var/lib/s4
export S4_MAX_UPLOAD_SIZE=10GB
./target/release/s4-serverConfigure AWS CLI to use S4:
aws configure set aws_access_key_id myaccesskey
aws configure set aws_secret_access_key mysecretkeyBasic operations:
# Create a bucket
aws --endpoint-url http://localhost:9000 s3 mb s3://mybucket
# Upload a file
aws --endpoint-url http://localhost:9000 s3 cp file.txt s3://mybucket/file.txt
# List objects
aws --endpoint-url http://localhost:9000 s3 ls s3://mybucket
# Download a file
aws --endpoint-url http://localhost:9000 s3 cp s3://mybucket/file.txt downloaded.txt
# Delete a file
aws --endpoint-url http://localhost:9000 s3 rm s3://mybucket/file.txt
# Delete a bucket
aws --endpoint-url http://localhost:9000 s3 rb s3://mybucketS4 supports S3-compatible object versioning to preserve, retrieve, and restore every version of every object.
# Enable versioning on bucket
aws s3api put-bucket-versioning \
--bucket mybucket \
--versioning-configuration Status=Enabled \
--endpoint-url https://127.0.0.1:9000 \
--no-verify-ssl
# Upload file (version 1)
echo "version 1" | aws s3api put-object \
--bucket mybucket \
--key file.txt \
--body - \
--endpoint-url https://127.0.0.1:9000 \
--no-verify-ssl
# Upload again (version 2)
echo "version 2" | aws s3api put-object \
--bucket mybucket \
--key file.txt \
--body - \
--endpoint-url https://127.0.0.1:9000 \
--no-verify-ssl
# List all versions
aws s3api list-object-versions \
--bucket mybucket \
--prefix file.txt \
--endpoint-url https://127.0.0.1:9000 \
--no-verify-ssl
# Get specific version
aws s3api get-object \
--bucket mybucket \
--key file.txt \
--version-id "ff495d34-c292-4af4-9d10-e186272010ed" \
first_version.txt \
--endpoint-url https://127.0.0.1:9000 \
--no-verify-ssl
# Delete object (creates delete marker)
aws s3api delete-object \
--bucket mybucket \
--key file.txt \
--endpoint-url https://127.0.0.1:9000 \
--no-verify-sslS4 supports automatic object expiration and cleanup based on lifecycle rules.
# Create lifecycle configuration file
cat > lifecycle.json <<'EOF'
{
"Rules": [
{
"ID": "expire-logs",
"Status": "Enabled",
"Filter": {
"Prefix": "logs/"
},
"Expiration": {
"Days": 30
}
},
{
"ID": "cleanup-old-versions",
"Status": "Enabled",
"Filter": {
"Prefix": ""
},
"NoncurrentVersionExpiration": {
"NoncurrentDays": 90
}
}
]
}
EOF
# Set lifecycle configuration
aws s3api put-bucket-lifecycle-configuration \
--bucket mybucket \
--lifecycle-configuration file://lifecycle.json \
--endpoint-url https://127.0.0.1:9000 \
--no-verify-ssl
# Get lifecycle configuration
aws s3api get-bucket-lifecycle-configuration \
--bucket mybucket \
--endpoint-url https://127.0.0.1:9000 \
--no-verify-ssl
# Delete lifecycle configuration
aws s3api delete-bucket-lifecycle \
--bucket mybucket \
--endpoint-url https://127.0.0.1:9000 \
--no-verify-sslLifecycle worker configuration:
# Enable/disable lifecycle worker (default: enabled)
export S4_LIFECYCLE_ENABLED=true
# Set evaluation interval in hours (default: 24)
export S4_LIFECYCLE_INTERVAL_HOURS=24
# Enable dry-run mode to test without deleting (default: false)
export S4_LIFECYCLE_DRY_RUN=trueS4 includes a built-in IAM system with role-based access control. IAM is enabled when S4_ROOT_PASSWORD is set.
Roles:
- Reader -- can list buckets/objects and download objects
- Writer -- Reader permissions plus create/delete buckets and objects
- SuperUser -- full admin access including user management
Starting with IAM enabled:
export S4_ROOT_PASSWORD=password12345
./target/release/s4-serverAdmin API usage (curl):
# Login (get JWT token)
TOKEN=$(curl -s -k -X POST https://localhost:9000/api/admin/login \
-H 'Content-Type: application/json' \
-d '{"username":"root","password":"password12345"}' | jq -r '.token')
# List users
curl -s -k https://localhost:9000/api/admin/users \
-H "Authorization: Bearer $TOKEN"
# Create a user
curl -s -k -X POST https://localhost:9000/api/admin/users \
-H "Authorization: Bearer $TOKEN" \
-H 'Content-Type: application/json' \
-d '{"username":"alice","password":"alice123","role":"Writer"}'
# Generate S3 credentials for a user
curl -s -k -X POST https://localhost:9000/api/admin/users/<user-id>/credentials \
-H "Authorization: Bearer $TOKEN"
# Update user (change role, password, or active status)
curl -s -k -X PUT https://localhost:9000/api/admin/users/<user-id> \
-H "Authorization: Bearer $TOKEN" \
-H 'Content-Type: application/json' \
-d '{"role":"Reader"}'
# Delete S3 credentials
curl -s -k -X DELETE https://localhost:9000/api/admin/users/<user-id>/credentials \
-H "Authorization: Bearer $TOKEN"
# Delete user
curl -s -k -X DELETE https://localhost:9000/api/admin/users/<user-id> \
-H "Authorization: Bearer $TOKEN"Using S3 with IAM credentials:
After generating S3 credentials via the Admin API, use them with AWS CLI:
aws configure set aws_access_key_id S4AK_xxxxxxxx
aws configure set aws_secret_access_key xxxxxxxx
aws --endpoint-url https://localhost:9000 --no-verify-ssl s3 lsLegacy S4_ACCESS_KEY_ID / S4_SECRET_ACCESS_KEY environment credentials continue to work as a fallback with full (SuperUser) access.
S4 includes a built-in SQL query engine powered by Apache DataFusion. Query your stored objects directly β no need to download them first.
Single-Object Query (S3 Select API):
# Upload a CSV file
aws --endpoint-url http://localhost:9000 s3 cp employees.csv s3://mybucket/employees.csv
# Query it with SQL (via curl β returns binary event stream)
curl -X POST "http://localhost:9000/mybucket/employees.csv?select&select-type=2" \
-H "Content-Type: application/xml" \
-d '<?xml version="1.0" encoding="UTF-8"?>
<SelectObjectContentRequest>
<Expression>SELECT name, salary FROM s3object WHERE CAST(salary AS INT) > 100000</Expression>
<ExpressionType>SQL</ExpressionType>
<InputSerialization>
<CSV><FileHeaderInfo>USE</FileHeaderInfo></CSV>
</InputSerialization>
<OutputSerialization>
<CSV/>
</OutputSerialization>
</SelectObjectContentRequest>'Supported input formats: CSV, JSON (Lines/Document), Parquet. Output formats: CSV, JSON.
Multi-Object SQL Query (S4 Extended):
S4 extends S3 Select with multi-object queries using glob patterns:
# Upload multiple CSV files
aws --endpoint-url http://localhost:9000 s3 cp data1.csv s3://mybucket/logs/data1.csv
aws --endpoint-url http://localhost:9000 s3 cp data2.csv s3://mybucket/logs/data2.csv
# Query across all matching objects (JSON output)
curl -X POST "http://localhost:9000/mybucket?sql" \
-H "Content-Type: application/json" \
-d '{"sql": "SELECT * FROM '\''logs/*.csv'\'' WHERE status = '\''ERROR'\''", "format": "csv", "output": "json"}'
# Aggregation across files (CSV output)
curl -X POST "http://localhost:9000/mybucket?sql" \
-H "Content-Type: application/json" \
-d '{"sql": "SELECT COUNT(*) as total, AVG(CAST(value AS DOUBLE)) as avg_val FROM '\''logs/*.csv'\''", "format": "csv", "output": "csv"}'Full SQL support includes WHERE, GROUP BY, ORDER BY, LIMIT, JOIN, window functions, CTEs, and aggregate functions.
S4 supports leaderless quorum replication for high availability. A cluster is composed of server pools β immutable groups of nodes that replicate data among themselves.
Key properties:
- Any node can serve any S3 request (no single leader)
- Default quorum: N=3, W=2, R=2 β tolerates 1 node failure
- SWIM gossip for failure detection, gRPC for data replication
- Buckets are pinned to pools; horizontal scaling = adding new pools
- HLC + LWW conflict resolution (deterministic, no coordination)
- Anti-entropy via Merkle trees (background repair every 10 min)
- Distributed tombstone GC with zombie resurrection protection
- Bit rot detection and auto-healing from replicas
- Rolling upgrades, graceful shutdown, admin API for cluster ops
Quick start (3-node cluster):
# Node 1
S4_MODE=cluster \
S4_NODE_ID=node-1 \
S4_NODE_GRPC_ADDR=10.0.1.1:9100 \
S4_NODE_HTTP_ADDR=10.0.1.1:9000 \
S4_SEEDS=10.0.1.1:9100,10.0.1.2:9100,10.0.1.3:9100 \
S4_POOL_NAME=pool-1 \
S4_POOL_NODES=node-1:10.0.1.1:9100,node-2:10.0.1.2:9100,node-3:10.0.1.3:9100 \
S4_ACCESS_KEY_ID=myaccesskey \
S4_SECRET_ACCESS_KEY=mysecretkey \
./s4-server
# Node 2 and 3: same config, different S4_NODE_ID and S4_NODE_*_ADDRFor detailed architecture, Docker Compose examples, and configuration reference, see docs/04-features/federation.md.
S4 supports S3-compatible CORS (Cross-Origin Resource Sharing) for browser-based access.
# Set CORS configuration
curl -X PUT "http://localhost:9000/mybucket?cors" \
-H "Content-Type: application/xml" \
-d '<?xml version="1.0" encoding="UTF-8"?>
<CORSConfiguration>
<CORSRule>
<AllowedOrigin>https://example.com</AllowedOrigin>
<AllowedMethod>GET</AllowedMethod>
<AllowedMethod>PUT</AllowedMethod>
<AllowedHeader>*</AllowedHeader>
<MaxAgeSeconds>3600</MaxAgeSeconds>
</CORSRule>
</CORSConfiguration>'
# Get CORS configuration
curl "http://localhost:9000/mybucket?cors"
# Delete CORS configuration
curl -X DELETE "http://localhost:9000/mybucket?cors"S4 supports TLS for encrypted connections. TLS is disabled by default and enabled automatically when both certificate and key paths are provided.
Generating self-signed certificates (for development):
# Generate self-signed certificate and key
openssl req -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem -days 365 -nodes \
-subj "/CN=localhost"Running with TLS:
export S4_TLS_CERT=/path/to/cert.pem
export S4_TLS_KEY=/path/to/key.pem
./target/release/s4-serverUsing with AWS CLI (HTTPS):
# For self-signed certificates, use --no-verify-ssl
aws --endpoint-url https://localhost:9000 --no-verify-ssl s3 ls
# For production with valid certificates
aws --endpoint-url https://s4.example.com:9000 s3 lsCertificate requirements:
- PEM-encoded X.509 certificate
- PEM-encoded private key (RSA, ECDSA, or Ed25519)
- Certificate chain is supported (include intermediate certs in cert.pem)
You can also use a config.toml file:
[server]
bind = "0.0.0.0:9000"
[storage]
data_path = "/var/lib/s4/volumes"
metadata_path = "/var/lib/s4/metadata_db"
[tuning]
volume_size_mb = 1024 # 1GB
strict_sync = true- Architecture Guide - Detailed architecture documentation
- Contributing Guide - How to contribute to S4
- API Documentation - API reference documentation
See CONTRIBUTING.md for development setup and guidelines.
- Community Edition (CE): Apache License 2.0 β all code outside
ee/directory - Enterprise Edition (EE): Elastic License 2.0 β code inside
ee/directory
EE source code is available for audit. Using EE features in production requires a valid license key. See ee/README.md for details.
π§ Early Development - S4 is currently in active development. Not ready for production use.
