Data consultant with expertise in designing and implementing scalable data pipelines, cloud infrastructure, and analytics solutions. Experienced in building production-grade ETL/ELT systems using modern data stack technologies across AWS and GCP. Published researcher in quantitative finance with a strong foundation in mathematics, statistics, and computer science.
Current Focus: Building enterprise data platforms, real-time streaming architectures, and ML-driven analytics solutions for cross-functional business teams.
Data Engineering Cloud Architecture Data Science & ML
ββ ETL/ELT Pipelines ββ AWS Services ββ Statistical Modeling
ββ Data Modeling ββ Google Cloud Platform ββ Predictive Analytics
ββ Stream Processing ββ Infrastructure as Code ββ Feature Engineering
ββ Data Quality ββ CI/CD Pipelines ββ Model Deployment
ββ Orchestration ββ Containerization
|
Python |
SQL |
R |
Java |
JavaScript |
C# |
C++ |
Bash |
MATLAB |
|
AWS |
GCP |
Docker |
Terraform |
GitHub |
Git |
Linux |
|
Kafka |
Spark |
Databricks |
dbt |
Airbyte |
Airflow |
Delta Lake |
|
PostgreSQL |
MySQL |
MongoDB |
BigQuery |
Snowflake |
SQLAlchemy |
|
Pandas |
NumPy |
Scikit-learn |
TensorFlow |
Matplotlib |
Seaborn |
Tableau |
Power BI |
|
HTML5 |
CSS3 |
React |
Angular |
.NET Core |
FastAPI |
Modern Data Stack Implementation | GCP + Airbyte + BigQuery + dbt + Airflow
Built an end-to-end analytics pipeline on Google Cloud Platform for digital media analytics, implementing a complete modern data stack architecture.
Architecture Highlights:
- Designed multi-source data ingestion using Airbyte with automated schema validation
- Developed layered data warehouse in BigQuery following medallion architecture
- Implemented SQL transformations using dbt Core with comprehensive testing and lineage tracking
- Orchestrated daily workflows with Apache Airflow for reliable, scheduled execution
- Achieved 99.9% pipeline reliability with automated monitoring and alerting
Tech Stack: GCP Airbyte BigQuery dbt Core Apache Airflow SQL Python
Batch & Streaming Architecture | AWS + Kafka + Databricks + PySpark
Designed and implemented dual-mode data pipeline supporting both batch and real-time streaming for Pinterest-style data platform.
Technical Implementation:
- Built Apache Kafka producers for real-time API data ingestion
- Configured AWS Kinesis for stream processing with sub-second latency
- Developed PySpark transformations in Databricks for large-scale data processing
- Implemented Delta Lake tables with ACID transactions and schema enforcement
- Orchestrated complex workflows using Airflow on AWS MWAA
- Designed star schema data models optimized for analytical queries
Tech Stack: AWS (S3, RDS, Kinesis) Apache Kafka Databricks PySpark Delta Lake Airflow Python
Cloud-Native Data Integration | Python + AWS RDS + PostgreSQL
Engineered production-ready ETL pipeline for extracting, transforming, and loading sales data from heterogeneous sources into cloud data warehouse.
Key Features:
- Developed Python ETL framework handling APIs, PDFs, JSON, and S3 sources
- Implemented robust error handling and data validation mechanisms
- Designed star schema in PostgreSQL on AWS RDS optimized for OLAP workloads
- Utilized SQLAlchemy ORM for database abstraction and connection pooling
- Automated data quality checks and anomaly detection
Tech Stack: Python AWS RDS PostgreSQL SQLAlchemy boto3 pandas tabula-py
Quantitative Finance Research | Published in Peer-Reviewed Journal
Conducted comprehensive comparative analysis of stochastic models for stock price prediction, investigating the impact of historical data duration and volatility regimes on forecasting accuracy.
Research Contributions:
- Implemented and compared three stochastic models: Geometric Brownian Motion, Heston, and Merton Jump Diffusion
- Analyzed model performance across different market volatility conditions
- Examined optimal historical data windows for accurate predictions
- Published findings in Quantitative Finance and Economics, Volume 9, Issue 3, 2025
Tech Stack: Python NumPy SciPy Matplotlib Quantitative Finance
Healthcare Analytics & Machine Learning | Python + ML
Developed machine learning models to predict ICU admission during COVID-19 pandemic, enabling proactive resource allocation and capacity planning.
Analytics Approach:
- Conducted exploratory data analysis on large-scale ICU admission records
- Engineered clinical and demographic features for predictive modeling
- Implemented ensemble methods using XGBoost and TensorFlow
- Achieved high prediction accuracy for resource demand forecasting
- Validated models using cross-validation and holdout testing
Tech Stack: Python pandas scikit-learn XGBoost TensorFlow matplotlib seaborn
Data Consultant | AiCore | Jun 2025 β Present
- Architect and implement production data pipelines using Python, SQL, Spark, and cloud platforms (AWS/Azure)
- Design and deploy data lakes and warehouses with CI/CD automation
- Build analytics dashboards and reports using Power BI and Tableau for business intelligence
- Apply DevSecOps, MLOps, and data governance best practices across enterprise projects
Mathematics & Computer Science Tutor | Oxford International Education Group | Feb 2025 β Present
- Deliver lectures and tutorials across six foundation-level modules for international students
- Design inclusive curriculum and digital teaching materials for diverse learning backgrounds
- Assess student performance through coursework, coding assignments, and technical presentations
Data Science Placement | NHS England (NHSE) | Jun 2025
- Completed a cancer-focused data science project as part of work experience placement at NHSE, using synthetic NHS Simulacrum data.
- Applied core healthcare data science methods including exploratory data analysis, feature engineering, and machine learning model development.
Trainee Software, Cloud & Data Engineer | AiCore | Dec 2024 β May 2025
- Completed intensive programme in software engineering, data engineering, and cloud architecture
- Built production-scale batch and streaming pipelines using Kafka, Databricks, Airflow, and AWS
- Delivered capstone projects demonstrating full-stack data pipeline design and deployment
Mulualem Kahssay & Shihan Miah (2025)
A Comparative Analysis of Stochastic Models for Stock Price Forecasting: The Influence of Historical Data Duration and Volatility Regimes
Quantitative Finance and Economics, 9(3), 602β630
DOI: 10.3934/QFE.2025021
- AiCore Certificate in Cloud & Data Engineering
- AiCore Certificate in Software Engineering
- BTEC Certificate in Work Skills
- CITB Certificate in Health and Safety
- BCS Certificate in Digital Skills