Skip to content

sprckt/pystack

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PyDataStack

A modern, open-source data platform built entirely with Python tools, demonstrating a complete end-to-end data pipeline for Star Wars film data.

Technologies

  • Data Ingestion: dlt for extracting data from the Star Wars API
  • Data Warehouse: DuckDB for fast, embedded analytics
  • Data Transformation: dbt for SQL-based data modelling
  • Data Orchestration: Dagster for pipeline management
  • Data Visualization: Streamlit for interactive dashboards

Prerequisites

  • Python 3.8+
  • just command runner (optional)

Installation

  1. Clone the repository:
git clone https://github.com/your-username/pystack.git
cd pystack
  1. Create and activate a virtual environment:
uv sync

Quick Start

Run the Streamlit Dashboard

just bi
# Or manually:
cd src && streamlit run visualisation/app.py

Run Dagster Pipeline

just orchestrate
# Or manually:
dagster dev -f src/orchestration/definitions.py

Query DuckDB

just duck
# Or manually:
duckdb src/pystack.duckdb

View dbt Documentation

just dbt-docs

Project Structure

pystack/
├── src/
│   ├── orchestration/      # Dagster pipeline definitions
│   ├── transformation/     # dbt models and configurations
│   └── visualisation/      # Streamlit dashboard
├── justfile                # Command shortcuts
└── README.md

Dashboard Features

  • Financials: View film budgets, box office revenue, and ROI
  • Attributes: Analyze species, characters, planets, and starships per film

License

This project is open source and available under the MIT License.

Acknowledgments

  • Star Wars API (SWAPI) for providing the data
  • PyConDE 2025 for inspiration

Built using Python for PyCon DE and PyData 2025

About

Open source data platform built on Python libraries

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors