A modern, open-source data platform built entirely with Python tools, demonstrating a complete end-to-end data pipeline for Star Wars film data.
- Data Ingestion: dlt for extracting data from the Star Wars API
- Data Warehouse: DuckDB for fast, embedded analytics
- Data Transformation: dbt for SQL-based data modelling
- Data Orchestration: Dagster for pipeline management
- Data Visualization: Streamlit for interactive dashboards
- Python 3.8+
- just command runner (optional)
- Clone the repository:
git clone https://github.com/your-username/pystack.git
cd pystack- Create and activate a virtual environment:
uv syncjust bi
# Or manually:
cd src && streamlit run visualisation/app.pyjust orchestrate
# Or manually:
dagster dev -f src/orchestration/definitions.pyjust duck
# Or manually:
duckdb src/pystack.duckdbjust dbt-docspystack/
├── src/
│ ├── orchestration/ # Dagster pipeline definitions
│ ├── transformation/ # dbt models and configurations
│ └── visualisation/ # Streamlit dashboard
├── justfile # Command shortcuts
└── README.md
- Financials: View film budgets, box office revenue, and ROI
- Attributes: Analyze species, characters, planets, and starships per film
This project is open source and available under the MIT License.
- Star Wars API (SWAPI) for providing the data
- PyConDE 2025 for inspiration
Built using Python for PyCon DE and PyData 2025