Skip to content

Data transformation project leveraging Snowflake, dbt, Airflow, and Astronomer to implement Kimball’s star schema for dimensional modeling of British Airways review data. Automates pipelines for transformation and orchestration, with CI/CD powered by GitHub Actions.

License

Notifications You must be signed in to change notification settings

MarkPhamm/british_airways_transformation

Repository files navigation

British Airways Data Transformation

This project implements a modern data transformation pipeline for British Airways, designed to process and model customer review data from Airline Quality. It leverages dbt, Snowflake, and Apache Airflow, orchestrated via Astronomer, to create a scalable, production-ready workflow.

image


🗂 Project Structure

.
├── data/                 # Raw and processed data files
├── data_model/           # Data model diagrams and definitions
│   ├── schema.jpeg       # Visual data model
│   └── schema.txt        # Text-based schema description
├── dbt-dags/             # dbt models and Airflow DAGs
│   ├── dags/             # DAG definitions
│   ├── tests/            # Data quality tests
│   └── .astro/           # Astronomer CLI configuration
├── notebooks/            # Jupyter notebooks for Snowflake analysis
│   ├── snowflake_connection.ipynb
│   └── snowflake_connection.py
└── requirements.txt      # Python package requirements

⚙️ Technology Stack

BritishAirways (1)


🧱 Data Architecture

1. Data Source

Customer reviews are scraped from AirlineQuality.com, capturing structured and unstructured data elements including:

  • Flight Route (e.g., Singapore to Sydney)
  • Aircraft Type (e.g., Boeing 777)
  • Seat Type (e.g., Business Class)
  • Type of Traveller (e.g., Solo Leisure)
  • Date Flown (e.g., March 2025)
  • Star Ratings (Seat Comfort, Cabin Staff, Food & Beverages, Entertainment, Ground Service, Value for Money)
  • Review Text and Submission Date
  • Verification Flag (Trip Verified)
  • Reviewer Info (Name, Country, Number of Reviews)

2. Data Model

Dimension Tables

  • dim_customer: Identity, loyalty, and flight history
  • dim_aircraft: Manufacturer, model, and seating layout
  • dim_location: Airports, cities, and time zones
  • dim_date: Calendar and fiscal date tracking

Fact Table

  • fct_review: One row per customer review per flight
    • Includes metrics (ratings), booleans (verified, recommended), and categorical fields (seat type, travel type)

3. Data Flow

  • Source Layer: Web scraping + staging
  • Transformation Layer: dbt modeling + business logic
  • Orchestration Layer: DAG scheduling and task dependency management via Astronomer
  • Presentation Layer: Clean fact/dim tables for BI/reporting

4. Data Quality Framework

  • Null checks
  • Foreign key validation
  • Freshness and completeness monitoring
  • dbt tests for schema integrity and logic rules

🧩 Project Components

📊 Data Model

Located in data_model/:

  • schema.jpeg: visual schema overview
  • schema.txt: detailed textual schema

🛠 dbt + Airflow with Astronomer

Located in dbt-dags/:

  • dbt model definitions and tests
  • Airflow DAGs orchestrated via Astro CLI
  • Modular structure for local development and deployment to Astronomer Cloud or Docker environments

📈 Analysis & Validation

Notebook resources in notebooks/ for:

  • Establishing a connection to Snowflake
  • Running exploratory queries
  • Testing pipeline output

📦 Key Dependencies

  • dbt-snowflake==1.9.2
  • pandas==2.2.3
  • snowflake-sqlalchemy==1.7.3
  • Managed via requirements.txt

🔄 CI/CD Pipeline

This project uses GitHub Actions for automated data transformation pipeline:

Workflow Triggers

  • Push: Triggers on pushes to main branch
  • Pull Request: Runs on PRs to main branch
  • Schedule: Executes daily at 00:00 UTC
  • Manual: Can be triggered via workflow_dispatch

Pipeline Steps

  1. Environment Setup

    • Python 3.12 setup
    • Dependencies installation
    • dbt package installation
  2. Data Transformation

    • Runs dbt build process
    • Uses Snowflake credentials from secrets
  3. Notifications

    • Sends email notifications on completion
    • Includes run time, trigger info, and status

Workflow Status

BA Transformation


🌐 Data Modeling Approach

Let me know if you'd like a diagram for the Airflow DAG flow or a README.md version with clickable section links and badges.

Data Model Overview

Step 1: Business Processes

Business processes represent real-world events that generate measurable data. For British Airways, the core business process is the collection of customer flight reviews. Each review submitted by a customer reflects their experience on a specific flight and becomes a fact event. These reviews include detailed ratings on various service categories, forming the backbone of our analytical layer.

Step 2: Define the Grain

The grain defines the level of detail stored in the fact table. For this model, the grain is defined as:

One customer review per flight.

Each row in the fct_review table represents a unique review event containing metrics tied to a specific customer’s flight experience. This atomic grain ensures consistency and supports granular performance analysis across multiple service touchpoints.

Step 3: Dimensions for Descriptive Context

Dimension tables provide the who, what, where, when context for interpreting facts.

  • Who: dim_customer — describes the reviewer through customer_name, nationality, and number_of_flights.
  • What: dim_aircraft — provides context on the aircraft via aircraft_model, aircraft_manufacturer, and seat_capacity.
  • Where: dim_location — captures the origin, destination, and transit points, using a combination of city and airport_name.
  • When: dim_date — captures both date_flown and date_submitted and supports calendar and financial date logic (cal_year, fin_quarter, etc.).

Step 4: Facts for Measurement

Facts are the quantitative outputs from the review process, collected per flight review:

  • Ratings:
    seat_comfort, cabin_staff_service, food_and_beverages, inflight_entertainment, ground_service, wifi_and_connectivity, value_for_money

  • Booleans:
    verified, recommended

  • Categorical Descriptions:
    seat_type, type_of_traveller, review_text

These facts represent real customer input and form the foundation for performance dashboards, KPIs, and customer sentiment insights.


⭐ Star Schema Overview

This model follows a classic star schema structure where the fct_review table sits at the center and joins to dimension tables via foreign keys:

Foreign Key in fct_review Dimension Table Description
customer_id dim_customer Links each review to a specific customer
date_submitted_id / date_flown_id dim_date Supports dual-date tracking (when submitted vs when flown)
origin_location_id, destination_location_id, transit_location_id dim_location Connects review to flight locations
aircraft_id dim_aircraft Captures aircraft-related context

schema

This schema supports efficient slicing, filtering, and aggregating reviews by date, location, customer, and aircraft, enabling detailed insights across British Airways’ customer experience.

About

Data transformation project leveraging Snowflake, dbt, Airflow, and Astronomer to implement Kimball’s star schema for dimensional modeling of British Airways review data. Automates pipelines for transformation and orchestration, with CI/CD powered by GitHub Actions.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published