Skip to content

Latest commit

 

History

History
87 lines (64 loc) · 1.96 KB

File metadata and controls

87 lines (64 loc) · 1.96 KB

PyDataStack

A modern, open-source data platform built entirely with Python tools, demonstrating a complete end-to-end data pipeline for Star Wars film data.

Technologies

  • Data Ingestion: dlt for extracting data from the Star Wars API
  • Data Warehouse: DuckDB for fast, embedded analytics
  • Data Transformation: dbt for SQL-based data modelling
  • Data Orchestration: Dagster for pipeline management
  • Data Visualization: Streamlit for interactive dashboards

Prerequisites

  • Python 3.8+
  • just command runner (optional)

Installation

  1. Clone the repository:
git clone https://github.com/your-username/pystack.git
cd pystack
  1. Create and activate a virtual environment:
uv sync

Quick Start

Run the Streamlit Dashboard

just bi
# Or manually:
cd src && streamlit run visualisation/app.py

Run Dagster Pipeline

just orchestrate
# Or manually:
dagster dev -f src/orchestration/definitions.py

Query DuckDB

just duck
# Or manually:
duckdb src/pystack.duckdb

View dbt Documentation

just dbt-docs

Project Structure

pystack/
├── src/
│   ├── orchestration/      # Dagster pipeline definitions
│   ├── transformation/     # dbt models and configurations
│   └── visualisation/      # Streamlit dashboard
├── justfile                # Command shortcuts
└── README.md

Dashboard Features

  • Financials: View film budgets, box office revenue, and ROI
  • Attributes: Analyze species, characters, planets, and starships per film

License

This project is open source and available under the MIT License.

Acknowledgments

  • Star Wars API (SWAPI) for providing the data
  • PyConDE 2025 for inspiration

Built using Python for PyCon DE and PyData 2025