Skip to content

declaredata/fuse_python

Repository files navigation


PyPI - Version Python Version License CI Benchmark

DeclareData Fuse Client Bindings for Python

A Python client library for DeclareData Fuse Server that provides a PySpark-compatible API. Scale down your Spark clusters and speed up workloads without changing your code.

DeclareData Fuse Server and this library are under active development. This is a pre-release version and may contain bugs or incomplete features. Please review and contribute to our compatibility development status.

Contents

Prerequisites

  • Python 3.10 or higher
  • 8GB+ available memory
  • pip package manager
  • Docker
  • Available port 8080 (required for gRPC) and port 3000 (optional for web interface)

Components

Server Setup

Run the Fuse server using Docker:

docker run -p 8080:8080 -p 3000:3000 ghcr.io/declaredata/fuse:latest

Note: All images are published to our GitHub Package Docker repository, which can be found at github.com/orgs/declaredata/packages/container/package/fuse.

Python Client Installation

Install from PyPI:

pip install declaredata_fuse

Update to the latest version:

pip install --upgrade declaredata_fuse

Quick Start Guide

Initialize a Session

from declaredata_fuse.session import FuseSession

# Connect to DeclareData Fuse Server (default: localhost:8080)
fs = FuseSession.builder.getOrCreate()

Basic Data Operations

# Read CSV file
df = fs.read.csv("data.csv")
df.show(10)

# Filter data
df.filter(df.year >= 2000).show(10)

# Sort and select columns
df.sort(
    df.population, ascending=False
).select(
    df.year, df.state_abbr, df.population
).show(10)

# Group and aggregate
import declaredata_fuse.functions as F

df.groupBy("year").agg(
    F.first("population").alias("highest_population_of_year")
).sort(
    df.highest_population_of_year, ascending=False
).show(10)

Other Documentation 🚧 WIP

  • Additional API documentation is also available here
  • Usage examples can be found in the bench directory

Issue Reporting

Please report issues via our GitHub Issues page with the following information:

  • Problem description
  • Steps to reproduce
  • Expected vs actual behavior
  • Environment details (OS, Python version)
  • Error messages or logs

For security concerns, please email us directly.

About

PySpark-compatible Python client for DeclareData Fuse Server: a blazing fast data processing engine and drop-in alternative to Spark clusters.

Topics

Resources

License

Stars

Watchers

Forks

Contributors