Bita Ashoori bashoori

Hi, I'm Bita

Data Engineer focused on building reliable, scalable data systems

About Me

I’m a Data Engineer with 5+ years of experience working with data systems that are often fragmented, inconsistent, and difficult to trust.

My work focuses on bringing structure to those environments by designing pipelines that are reliable, observable, and maintainable. I’ve worked across healthcare, retail, and enterprise systems where data is business-critical and small inconsistencies have real impact.

I’m particularly interested in how data platforms evolve, from legacy ETL toward modern, cloud-based architectures.

What I Focus On

Designing end-to-end data pipelines from ingestion to reporting
Structuring data using medallion and dimensional modeling approaches
Improving data quality through validation, monitoring, and clear transformation logic
Integrating multiple systems into consistent, analysis-ready datasets
Building toward scalable data platforms using Azure, Fabric, and Databricks

Tech Stack

Languages
Python · SQL · PySpark

Data Platforms & Orchestration
Azure Data Factory · Databricks · Apache Airflow

Cloud & Storage
Azure · AWS S3

Data Warehousing & Modeling
Redshift · PostgreSQL · Dimensional Modeling

Analytics & Reporting
Power BI

Other
Git · Docker · API Integration

Selected Projects

TransLink GTFS Data Warehouse

End-to-end data warehouse built on real GTFS transit data using a medallion architecture.

Structured raw transit feeds into validated Bronze, Silver, and Gold layers
Addressed domain-specific challenges such as time values beyond 24:00
Built dimensional models to support time-based analysis and reporting
Embedded data quality checks across pipeline layers

🔗 https://github.com/bashoori/transit_data_warehouse

Global Retail Lakehouse (Microsoft Fabric)

Designed a lakehouse platform for a multi-region retail scenario.

Implemented medallion architecture to standardize ingestion and transformation
Built unified data models for customers, products, and sales
Focused on creating consistent datasets across regions for scalable reporting

🔗 https://github.com/bashoori/Global-Retail-Lakehouse-on-Microsoft-Fabric

Databricks End-to-End Pipeline

Medallion-based pipeline using Delta Lake and Unity Catalog.

Designed for scalable processing and governed data access
Structured transformations for clarity, reuse, and maintainability

🔗 https://github.com/bashoori/data-engineering-portfolio/tree/main/databricks-end-to-end

Airflow + Spark + AWS Pipeline

Containerized ETL pipeline reflecting production patterns.

Implemented orchestration, retries, and scheduling
Focused on reliability and operational behavior of pipelines

🔗 https://github.com/bashoori/airflow-spark-aws-etl-pipeline

Direction

Microsoft Certified: Azure Data Fundamentals (DP-900)
Preparing for: Microsoft Fabric Data Engineer (DP-700)
Building hands-on projects focused on cloud-based data platforms

Build systems that remain reliable as complexity grows.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly