Skip to content
View Mulualem03's full-sized avatar

Block or report Mulualem03

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Mulualem03/README.md

Mulualem Kahssay

Data Consultant | Cloud & Data Engineer | Published Researcher

Typing SVG


Professional Summary

Data consultant with expertise in designing and implementing scalable data pipelines, cloud infrastructure, and analytics solutions. Experienced in building production-grade ETL/ELT systems using modern data stack technologies across AWS and GCP. Published researcher in quantitative finance with a strong foundation in mathematics, statistics, and computer science.

Current Focus: Building enterprise data platforms, real-time streaming architectures, and ML-driven analytics solutions for cross-functional business teams.


Technical Expertise

Core Competencies

Data Engineering          Cloud Architecture        Data Science & ML
β”œβ”€ ETL/ELT Pipelines     β”œβ”€ AWS Services          β”œβ”€ Statistical Modeling
β”œβ”€ Data Modeling         β”œβ”€ Google Cloud Platform  β”œβ”€ Predictive Analytics
β”œβ”€ Stream Processing     β”œβ”€ Infrastructure as Code β”œβ”€ Feature Engineering
β”œβ”€ Data Quality          β”œβ”€ CI/CD Pipelines       └─ Model Deployment
└─ Orchestration         └─ Containerization

Technology Stack

Programming Languages

Python
Python
SQL
SQL
R
R
Java
Java
JavaScript
JavaScript
C#
C#
C++
C++
Bash
Bash
MATLAB
MATLAB

Cloud & Infrastructure

AWS
AWS
GCP
GCP
Docker
Docker
Terraform
Terraform
GitHub
GitHub
Git
Git
Linux
Linux

Data Engineering & Big Data

Kafka
Kafka
Spark
Spark
Databricks
Databricks
dbt
dbt
Airbyte
Airbyte
Airflow
Airflow
Delta Lake
Delta Lake

Databases & Data Warehouses

PostgreSQL
PostgreSQL
MySQL
MySQL
MongoDB
MongoDB
BigQuery
BigQuery
Snowflake
Snowflake
SQLAlchemy
SQLAlchemy

Data Science & Analytics

Pandas
Pandas
NumPy
NumPy
Scikit-learn
Scikit-learn
TensorFlow
TensorFlow
Matplotlib
Matplotlib
Seaborn
Seaborn
Tableau
Tableau
Power BI
Power BI

Web Development

HTML5
HTML5
CSS3
CSS3
React
React
Angular
Angular
.NET Core
.NET Core
FastAPI
FastAPI

Featured Projects

DataDigest Analytics Pipeline

Modern Data Stack Implementation | GCP + Airbyte + BigQuery + dbt + Airflow

Built an end-to-end analytics pipeline on Google Cloud Platform for digital media analytics, implementing a complete modern data stack architecture.

Architecture Highlights:

  • Designed multi-source data ingestion using Airbyte with automated schema validation
  • Developed layered data warehouse in BigQuery following medallion architecture
  • Implemented SQL transformations using dbt Core with comprehensive testing and lineage tracking
  • Orchestrated daily workflows with Apache Airflow for reliable, scheduled execution
  • Achieved 99.9% pipeline reliability with automated monitoring and alerting

Tech Stack: GCP Airbyte BigQuery dbt Core Apache Airflow SQL Python

View Project β†’


Pinterest Data Pipeline

Batch & Streaming Architecture | AWS + Kafka + Databricks + PySpark

Designed and implemented dual-mode data pipeline supporting both batch and real-time streaming for Pinterest-style data platform.

Technical Implementation:

  • Built Apache Kafka producers for real-time API data ingestion
  • Configured AWS Kinesis for stream processing with sub-second latency
  • Developed PySpark transformations in Databricks for large-scale data processing
  • Implemented Delta Lake tables with ACID transactions and schema enforcement
  • Orchestrated complex workflows using Airflow on AWS MWAA
  • Designed star schema data models optimized for analytical queries

Tech Stack: AWS (S3, RDS, Kinesis) Apache Kafka Databricks PySpark Delta Lake Airflow Python

View Project β†’


Sales Data ETL Pipeline

Cloud-Native Data Integration | Python + AWS RDS + PostgreSQL

Engineered production-ready ETL pipeline for extracting, transforming, and loading sales data from heterogeneous sources into cloud data warehouse.

Key Features:

  • Developed Python ETL framework handling APIs, PDFs, JSON, and S3 sources
  • Implemented robust error handling and data validation mechanisms
  • Designed star schema in PostgreSQL on AWS RDS optimized for OLAP workloads
  • Utilized SQLAlchemy ORM for database abstraction and connection pooling
  • Automated data quality checks and anomaly detection

Tech Stack: Python AWS RDS PostgreSQL SQLAlchemy boto3 pandas tabula-py

View Project β†’


Stock Price Forecasting Models

Quantitative Finance Research | Published in Peer-Reviewed Journal

Conducted comprehensive comparative analysis of stochastic models for stock price prediction, investigating the impact of historical data duration and volatility regimes on forecasting accuracy.

Research Contributions:

  • Implemented and compared three stochastic models: Geometric Brownian Motion, Heston, and Merton Jump Diffusion
  • Analyzed model performance across different market volatility conditions
  • Examined optimal historical data windows for accurate predictions
  • Published findings in Quantitative Finance and Economics, Volume 9, Issue 3, 2025

Tech Stack: Python NumPy SciPy Matplotlib Quantitative Finance

Read Publication β†’


COVID-19 ICU Admission Prediction

Healthcare Analytics & Machine Learning | Python + ML

Developed machine learning models to predict ICU admission during COVID-19 pandemic, enabling proactive resource allocation and capacity planning.

Analytics Approach:

  • Conducted exploratory data analysis on large-scale ICU admission records
  • Engineered clinical and demographic features for predictive modeling
  • Implemented ensemble methods using XGBoost and TensorFlow
  • Achieved high prediction accuracy for resource demand forecasting
  • Validated models using cross-validation and holdout testing

Tech Stack: Python pandas scikit-learn XGBoost TensorFlow matplotlib seaborn


Professional Experience

Data Consultant | AiCore | Jun 2025 – Present

  • Architect and implement production data pipelines using Python, SQL, Spark, and cloud platforms (AWS/Azure)
  • Design and deploy data lakes and warehouses with CI/CD automation
  • Build analytics dashboards and reports using Power BI and Tableau for business intelligence
  • Apply DevSecOps, MLOps, and data governance best practices across enterprise projects

Mathematics & Computer Science Tutor | Oxford International Education Group | Feb 2025 – Present

  • Deliver lectures and tutorials across six foundation-level modules for international students
  • Design inclusive curriculum and digital teaching materials for diverse learning backgrounds
  • Assess student performance through coursework, coding assignments, and technical presentations

Data Science Placement | NHS England (NHSE) | Jun 2025

  • Completed a cancer-focused data science project as part of work experience placement at NHSE, using synthetic NHS Simulacrum data.
  • Applied core healthcare data science methods including exploratory data analysis, feature engineering, and machine learning model development.

Trainee Software, Cloud & Data Engineer | AiCore | Dec 2024 – May 2025

  • Completed intensive programme in software engineering, data engineering, and cloud architecture
  • Built production-scale batch and streaming pipelines using Kafka, Databricks, Airflow, and AWS
  • Delivered capstone projects demonstrating full-stack data pipeline design and deployment

Publications & Certifications

Published Research

Mulualem Kahssay & Shihan Miah (2025)
A Comparative Analysis of Stochastic Models for Stock Price Forecasting: The Influence of Historical Data Duration and Volatility Regimes
Quantitative Finance and Economics, 9(3), 602–630
DOI: 10.3934/QFE.2025021

Professional Certifications

  • AiCore Certificate in Cloud & Data Engineering
  • AiCore Certificate in Software Engineering
  • BTEC Certificate in Work Skills
  • CITB Certificate in Health and Safety
  • BCS Certificate in Digital Skills

GitHub Analytics

GitHub Stats Top Languages
GitHub Streak
Contribution Graph

Connect

LinkedIn GitHub Email Portfolio

Profile Views

Open to opportunities in Data Engineering, Cloud Architecture, and Data Science

Pinned Loading

  1. gsk-step-up-challenge gsk-step-up-challenge Public

    Phase-3 oncology trial analysis and prescriber decision-support prototype β€” GSK x DigData Step Up Career Challenge.

    HTML

  2. nhs-step-up-challenge nhs-step-up-challenge Public

    Antidepressant prescribing analysis across all three NHS DigData Step Up challenges (Excel, R, Python). Includes interactive web dashboard.

    HTML

  3. taskpilot taskpilot Public

    Full-stack MERN task management app. JWT auth, status workflow (todo, doing, done), priorities, due dates, search, filters, and a stats dashboard. React, Node.js, Express, MongoDB, Tailwind CSS.

    JavaScript

  4. covid-icu-prediction covid-icu-prediction Public

    Machine learning pipeline predicting ICU admission risk from COVID-19 clinical biomarkers. Random Forest reaches AUC 0.91 / Accuracy 94%, with SHAP explainability and IBM AIF360 three-stage fairnes…

    Jupyter Notebook

  5. postcode-prognosis postcode-prognosis Public

    Personal data science project. Predictive modelling of avoidable hospital admissions across 309 English Local Authorities, with TRIPOD-aligned reporting on health inequalities. Poisson GLM, spatial…

    Jupyter Notebook

  6. cancer-late-stage-prediction cancer-late-stage-prediction Public

    ML pipeline predicting late-stage (Stage III/IV) cancer diagnosis on 1.17M+ Simulacrum v2.0 records. Logistic Regression, Random Forest, Gradient Boosting compared with SMOTE and class weight balan…

    Jupyter Notebook