This project analyzes e-commerce delivery performance using BigQuery and advanced SQL.
The main objective is to evaluate operational efficiency, delivery SLA performance, and regional differences across markets, using real-world transactional data.
The project follows a production-style data architecture with raw ingestion and analytical views, similar to what is used in large-scale e-commerce operations teams.
The analysis is based on the Brazilian E-Commerce (OLIST) Dataset, a publicly available dataset containing over 100,000 real orders.
Source: Kaggle β Brazilian E-Commerce Public Dataset
The dataset includes:
- Order lifecycle timestamps
- Customer location (city/state)
- Order items, prices, and freight (delivery) costs
A two-layer BigQuery architecture was designed:
Stores source data in its original form:
orders_rawβ order status and delivery timestampscustomers_rawβ customer city and stateorder_items_rawβ item prices and freight (logistics) costs
Contains SQL views with business logic applied:
cleaned_ordersβ SLA flags and delivery delay calculationscleaned_orders_geoβ orders enriched with regional informationorder_value_per_orderβ aggregated order-level value and item counts
Using views instead of physical tables ensures data consistency and avoids duplication.
- On-time delivery rate (SLA)
- Average delivery delay (days)
- 90th percentile (p90) delivery delay
- Lead time (purchase β delivery)
- Regional (state/city) SLA performance
- High-value order delivery performance
All analyses were implemented using BigQuery Standard SQL, including:
- Date and timestamp transformations
- Conditional logic for SLA classification
- Aggregations and joins across multiple tables
- Percentile-based metrics (
APPROX_QUANTILES) - Region-level benchmarking
- Weekly SLA is generally high, but p90 lead time can reach ~30β37 days, indicating tail-risk in certain weeks.
- Average delay is negative in many cities, suggesting conservative estimated delivery dates, while on-time rate varies by city.
- Delivery performance differs significantly by state/city, highlighting region-specific constraints.
- High-value orders do not consistently show worse average delays; regional logistics factors appear more influential.
For a detailed interpretation, see insights/key_findings.md.
sql/
βββ 01_create_cleaned_orders.sql
βββ 02_weekly_sla_kpi.sql
βββ 03_create_cleaned_orders_geo.sql
βββ 04_city_state_sla_performance.sql
βββ 05_create_order_value_per_order.sql
βββ 06_high_value_orders_delay.sql