Data Engineer focused on building reliable, scalable data systems
I’m a Data Engineer with 5+ years of experience working with data systems that are often fragmented, inconsistent, and difficult to trust.
My work focuses on bringing structure to those environments by designing pipelines that are reliable, observable, and maintainable. I’ve worked across healthcare, retail, and enterprise systems where data is business-critical and small inconsistencies have real impact.
I’m particularly interested in how data platforms evolve, from legacy ETL toward modern, cloud-based architectures.
- Designing end-to-end data pipelines from ingestion to reporting
- Structuring data using medallion and dimensional modeling approaches
- Improving data quality through validation, monitoring, and clear transformation logic
- Integrating multiple systems into consistent, analysis-ready datasets
- Building toward scalable data platforms using Azure, Fabric, and Databricks
Languages
Python · SQL · PySpark
Data Platforms & Orchestration
Azure Data Factory · Databricks · Apache Airflow
Cloud & Storage
Azure · AWS S3
Data Warehousing & Modeling
Redshift · PostgreSQL · Dimensional Modeling
Analytics & Reporting
Power BI
Other
Git · Docker · API Integration
End-to-end data warehouse built on real GTFS transit data using a medallion architecture.
- Structured raw transit feeds into validated Bronze, Silver, and Gold layers
- Addressed domain-specific challenges such as time values beyond 24:00
- Built dimensional models to support time-based analysis and reporting
- Embedded data quality checks across pipeline layers
🔗 https://github.com/bashoori/transit_data_warehouse
Designed a lakehouse platform for a multi-region retail scenario.
- Implemented medallion architecture to standardize ingestion and transformation
- Built unified data models for customers, products, and sales
- Focused on creating consistent datasets across regions for scalable reporting
🔗 https://github.com/bashoori/Global-Retail-Lakehouse-on-Microsoft-Fabric
Medallion-based pipeline using Delta Lake and Unity Catalog.
- Designed for scalable processing and governed data access
- Structured transformations for clarity, reuse, and maintainability
🔗 https://github.com/bashoori/data-engineering-portfolio/tree/main/databricks-end-to-end
Containerized ETL pipeline reflecting production patterns.
- Implemented orchestration, retries, and scheduling
- Focused on reliability and operational behavior of pipelines
🔗 https://github.com/bashoori/airflow-spark-aws-etl-pipeline
- Microsoft Certified: Azure Data Fundamentals (DP-900)
- Preparing for: Microsoft Fabric Data Engineer (DP-700)
- Building hands-on projects focused on cloud-based data platforms
Build systems that remain reliable as complexity grows.


