This project implements a modern data transformation pipeline for British Airways, designed to process and model customer review data from Airline Quality. It leverages dbt, Snowflake, and Apache Airflow, orchestrated via Astronomer, to create a scalable, production-ready workflow.
.
├── data/ # Raw and processed data files
├── data_model/ # Data model diagrams and definitions
│ ├── schema.jpeg # Visual data model
│ └── schema.txt # Text-based schema description
├── dbt-dags/ # dbt models and Airflow DAGs
│ ├── dags/ # DAG definitions
│ ├── tests/ # Data quality tests
│ └── .astro/ # Astronomer CLI configuration
├── notebooks/ # Jupyter notebooks for Snowflake analysis
│ ├── snowflake_connection.ipynb
│ └── snowflake_connection.py
└── requirements.txt # Python package requirements
- Data Source: British Airways Reviews on AirlineQuality
- Programming Language: Python 3.12.5
- Data Warehouse: Snowflake
- Transformation Tool: dbt
- Orchestration: Apache Airflow powered by Astronomer
Customer reviews are scraped from AirlineQuality.com, capturing structured and unstructured data elements including:
- Flight Route (e.g., Singapore to Sydney)
- Aircraft Type (e.g., Boeing 777)
- Seat Type (e.g., Business Class)
- Type of Traveller (e.g., Solo Leisure)
- Date Flown (e.g., March 2025)
- Star Ratings (Seat Comfort, Cabin Staff, Food & Beverages, Entertainment, Ground Service, Value for Money)
- Review Text and Submission Date
- Verification Flag (Trip Verified)
- Reviewer Info (Name, Country, Number of Reviews)
dim_customer
: Identity, loyalty, and flight historydim_aircraft
: Manufacturer, model, and seating layoutdim_location
: Airports, cities, and time zonesdim_date
: Calendar and fiscal date tracking
fct_review
: One row per customer review per flight- Includes metrics (ratings), booleans (verified, recommended), and categorical fields (seat type, travel type)
- Source Layer: Web scraping + staging
- Transformation Layer: dbt modeling + business logic
- Orchestration Layer: DAG scheduling and task dependency management via Astronomer
- Presentation Layer: Clean fact/dim tables for BI/reporting
- Null checks
- Foreign key validation
- Freshness and completeness monitoring
- dbt tests for schema integrity and logic rules
Located in data_model/
:
schema.jpeg
: visual schema overviewschema.txt
: detailed textual schema
Located in dbt-dags/
:
- dbt model definitions and tests
- Airflow DAGs orchestrated via Astro CLI
- Modular structure for local development and deployment to Astronomer Cloud or Docker environments
Notebook resources in notebooks/
for:
- Establishing a connection to Snowflake
- Running exploratory queries
- Testing pipeline output
dbt-snowflake==1.9.2
pandas==2.2.3
snowflake-sqlalchemy==1.7.3
- Managed via
requirements.txt
This project uses GitHub Actions for automated data transformation pipeline:
- Push: Triggers on pushes to
main
branch - Pull Request: Runs on PRs to
main
branch - Schedule: Executes daily at 00:00 UTC
- Manual: Can be triggered via workflow_dispatch
-
Environment Setup
- Python 3.12 setup
- Dependencies installation
- dbt package installation
-
Data Transformation
- Runs dbt build process
- Uses Snowflake credentials from secrets
-
Notifications
- Sends email notifications on completion
- Includes run time, trigger info, and status
Let me know if you'd like a diagram for the Airflow DAG flow or a README.md
version with clickable section links and badges.
Business processes represent real-world events that generate measurable data. For British Airways, the core business process is the collection of customer flight reviews. Each review submitted by a customer reflects their experience on a specific flight and becomes a fact event. These reviews include detailed ratings on various service categories, forming the backbone of our analytical layer.
The grain defines the level of detail stored in the fact table. For this model, the grain is defined as:
One customer review per flight.
Each row in the fct_review
table represents a unique review event containing metrics tied to a specific customer’s flight experience. This atomic grain ensures consistency and supports granular performance analysis across multiple service touchpoints.
Dimension tables provide the who, what, where, when context for interpreting facts.
- Who:
dim_customer
— describes the reviewer throughcustomer_name
,nationality
, andnumber_of_flights
. - What:
dim_aircraft
— provides context on the aircraft viaaircraft_model
,aircraft_manufacturer
, andseat_capacity
. - Where:
dim_location
— captures the origin, destination, and transit points, using a combination ofcity
andairport_name
. - When:
dim_date
— captures bothdate_flown
anddate_submitted
and supports calendar and financial date logic (cal_year
,fin_quarter
, etc.).
Facts are the quantitative outputs from the review process, collected per flight review:
-
Ratings:
seat_comfort
,cabin_staff_service
,food_and_beverages
,inflight_entertainment
,ground_service
,wifi_and_connectivity
,value_for_money
-
Booleans:
verified
,recommended
-
Categorical Descriptions:
seat_type
,type_of_traveller
,review_text
These facts represent real customer input and form the foundation for performance dashboards, KPIs, and customer sentiment insights.
This model follows a classic star schema structure where the fct_review
table sits at the center and joins to dimension tables via foreign keys:
Foreign Key in fct_review |
Dimension Table | Description |
---|---|---|
customer_id |
dim_customer |
Links each review to a specific customer |
date_submitted_id / date_flown_id |
dim_date |
Supports dual-date tracking (when submitted vs when flown) |
origin_location_id , destination_location_id , transit_location_id |
dim_location |
Connects review to flight locations |
aircraft_id |
dim_aircraft |
Captures aircraft-related context |
This schema supports efficient slicing, filtering, and aggregating reviews by date, location, customer, and aircraft, enabling detailed insights across British Airways’ customer experience.