A Python client library for DeclareData Fuse Server that provides a PySpark-compatible API. Scale down your Spark clusters and speed up workloads without changing your code.
DeclareData Fuse Server and this library are under active development. This is a pre-release version and may contain bugs or incomplete features. Please review and contribute to our compatibility development status.
- Prerequisites
- Components
- Server Setup
- Python Client Installation
- Quick Start Guide
- Other Documentation 🚧 WIP
- Issue Reporting
- Python 3.10 or higher
- 8GB+ available memory
- pip package manager
- Docker
- Available port 8080 (required for gRPC) and port 3000 (optional for web interface)
- DeclareData Fuse Server: Blazing fast, low-overhead drop-in alternative to Apache Spark clusters that runs anywhere
- DeclareData Fuse Python: Python client library providing PySpark-compatible APIs
Run the Fuse server using Docker:
docker run -p 8080:8080 -p 3000:3000 ghcr.io/declaredata/fuse:latestNote: All images are published to our GitHub Package Docker repository, which can be found at github.com/orgs/declaredata/packages/container/package/fuse.
Install from PyPI:
pip install declaredata_fuseUpdate to the latest version:
pip install --upgrade declaredata_fusefrom declaredata_fuse.session import FuseSession
# Connect to DeclareData Fuse Server (default: localhost:8080)
fs = FuseSession.builder.getOrCreate()# Read CSV file
df = fs.read.csv("data.csv")
df.show(10)
# Filter data
df.filter(df.year >= 2000).show(10)
# Sort and select columns
df.sort(
df.population, ascending=False
).select(
df.year, df.state_abbr, df.population
).show(10)
# Group and aggregate
import declaredata_fuse.functions as F
df.groupBy("year").agg(
F.first("population").alias("highest_population_of_year")
).sort(
df.highest_population_of_year, ascending=False
).show(10)- Additional API documentation is also available
here - Usage examples can be found in the
benchdirectory
Please report issues via our GitHub Issues page with the following information:
- Problem description
- Steps to reproduce
- Expected vs actual behavior
- Environment details (OS, Python version)
- Error messages or logs
For security concerns, please email us directly.