Mulualem Kahssay Mulualem03

Mulualem Kahssay

Data Consultant | Cloud & Data Engineer | Published Researcher

Professional Summary

Data consultant with expertise in designing and implementing scalable data pipelines, cloud infrastructure, and analytics solutions. Experienced in building production-grade ETL/ELT systems using modern data stack technologies across AWS and GCP. Published researcher in quantitative finance with a strong foundation in mathematics, statistics, and computer science.

Current Focus: Building enterprise data platforms, real-time streaming architectures, and ML-driven analytics solutions for cross-functional business teams.

Technical Expertise

Core Competencies

Data Engineering          Cloud Architecture        Data Science & ML
├─ ETL/ELT Pipelines     ├─ AWS Services          ├─ Statistical Modeling
├─ Data Modeling         ├─ Google Cloud Platform  ├─ Predictive Analytics
├─ Stream Processing     ├─ Infrastructure as Code ├─ Feature Engineering
├─ Data Quality          ├─ CI/CD Pipelines       └─ Model Deployment
└─ Orchestration         └─ Containerization

Technology Stack

Programming Languages

Python

SQL

R

Java

JavaScript

C#

C++

Bash

MATLAB

Cloud & Infrastructure

AWS

GCP

Docker

Terraform

GitHub

Git

Linux

Data Engineering & Big Data

Kafka

Spark

Databricks

dbt

Airbyte

Airflow

Delta Lake

Databases & Data Warehouses

PostgreSQL

MySQL

MongoDB

BigQuery

Snowflake

SQLAlchemy

Data Science & Analytics

Pandas

NumPy

Scikit-learn

TensorFlow

Matplotlib

Seaborn

Tableau

Power BI

Web Development

HTML5

CSS3

React

Angular

.NET Core

FastAPI

Featured Projects

DataDigest Analytics Pipeline

Modern Data Stack Implementation | GCP + Airbyte + BigQuery + dbt + Airflow

Built an end-to-end analytics pipeline on Google Cloud Platform for digital media analytics, implementing a complete modern data stack architecture.

Architecture Highlights:

Designed multi-source data ingestion using Airbyte with automated schema validation
Developed layered data warehouse in BigQuery following medallion architecture
Implemented SQL transformations using dbt Core with comprehensive testing and lineage tracking
Orchestrated daily workflows with Apache Airflow for reliable, scheduled execution
Achieved 99.9% pipeline reliability with automated monitoring and alerting

Tech Stack: GCP Airbyte BigQuery dbt Core Apache Airflow SQL Python

View Project →

Pinterest Data Pipeline

Batch & Streaming Architecture | AWS + Kafka + Databricks + PySpark

Designed and implemented dual-mode data pipeline supporting both batch and real-time streaming for Pinterest-style data platform.

Technical Implementation:

Built Apache Kafka producers for real-time API data ingestion
Configured AWS Kinesis for stream processing with sub-second latency
Developed PySpark transformations in Databricks for large-scale data processing
Implemented Delta Lake tables with ACID transactions and schema enforcement
Orchestrated complex workflows using Airflow on AWS MWAA
Designed star schema data models optimized for analytical queries

Tech Stack: AWS (S3, RDS, Kinesis) Apache Kafka Databricks PySpark Delta Lake Airflow Python

View Project →

Sales Data ETL Pipeline

Cloud-Native Data Integration | Python + AWS RDS + PostgreSQL

Engineered production-ready ETL pipeline for extracting, transforming, and loading sales data from heterogeneous sources into cloud data warehouse.

Key Features:

Developed Python ETL framework handling APIs, PDFs, JSON, and S3 sources
Implemented robust error handling and data validation mechanisms
Designed star schema in PostgreSQL on AWS RDS optimized for OLAP workloads
Utilized SQLAlchemy ORM for database abstraction and connection pooling
Automated data quality checks and anomaly detection

Tech Stack: Python AWS RDS PostgreSQL SQLAlchemy boto3 pandas tabula-py

View Project →

Stock Price Forecasting Models

Quantitative Finance Research | Published in Peer-Reviewed Journal

Conducted comprehensive comparative analysis of stochastic models for stock price prediction, investigating the impact of historical data duration and volatility regimes on forecasting accuracy.

Research Contributions:

Implemented and compared three stochastic models: Geometric Brownian Motion, Heston, and Merton Jump Diffusion
Analyzed model performance across different market volatility conditions
Examined optimal historical data windows for accurate predictions
Published findings in Quantitative Finance and Economics, Volume 9, Issue 3, 2025

Tech Stack: Python NumPy SciPy Matplotlib Quantitative Finance

Read Publication →

COVID-19 ICU Admission Prediction

Healthcare Analytics & Machine Learning | Python + ML

Developed machine learning models to predict ICU admission during COVID-19 pandemic, enabling proactive resource allocation and capacity planning.

Analytics Approach:

Conducted exploratory data analysis on large-scale ICU admission records
Engineered clinical and demographic features for predictive modeling
Implemented ensemble methods using XGBoost and TensorFlow
Achieved high prediction accuracy for resource demand forecasting
Validated models using cross-validation and holdout testing

Tech Stack: Python pandas scikit-learn XGBoost TensorFlow matplotlib seaborn

Professional Experience

Data Consultant | AiCore | Jun 2025 – Present

Architect and implement production data pipelines using Python, SQL, Spark, and cloud platforms (AWS/Azure)
Design and deploy data lakes and warehouses with CI/CD automation
Build analytics dashboards and reports using Power BI and Tableau for business intelligence
Apply DevSecOps, MLOps, and data governance best practices across enterprise projects

Mathematics & Computer Science Tutor | Oxford International Education Group | Feb 2025 – Present

Deliver lectures and tutorials across six foundation-level modules for international students
Design inclusive curriculum and digital teaching materials for diverse learning backgrounds
Assess student performance through coursework, coding assignments, and technical presentations

Data Science Placement | NHS England (NHSE) | Jun 2025

Completed a cancer-focused data science project as part of work experience placement at NHSE, using synthetic NHS Simulacrum data.
Applied core healthcare data science methods including exploratory data analysis, feature engineering, and machine learning model development.

Trainee Software, Cloud & Data Engineer | AiCore | Dec 2024 – May 2025

Completed intensive programme in software engineering, data engineering, and cloud architecture
Built production-scale batch and streaming pipelines using Kafka, Databricks, Airflow, and AWS
Delivered capstone projects demonstrating full-stack data pipeline design and deployment

Publications & Certifications

Published Research

Mulualem Kahssay & Shihan Miah (2025)
A Comparative Analysis of Stochastic Models for Stock Price Forecasting: The Influence of Historical Data Duration and Volatility Regimes
Quantitative Finance and Economics, 9(3), 602–630
DOI: 10.3934/QFE.2025021

Professional Certifications

AiCore Certificate in Cloud & Data Engineering
AiCore Certificate in Software Engineering
BTEC Certificate in Work Skills
CITB Certificate in Health and Safety
BCS Certificate in Digital Skills

GitHub Analytics

Connect

Open to opportunities in Data Engineering, Cloud Architecture, and Data Science

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mulualem Kahssay Mulualem03

Block or report Mulualem03

Mulualem Kahssay

Data Consultant | Cloud & Data Engineer | Published Researcher

Professional Summary

Technical Expertise

Core Competencies

Technology Stack

Programming Languages

Cloud & Infrastructure

Data Engineering & Big Data

Databases & Data Warehouses

Data Science & Analytics

Web Development

Featured Projects

DataDigest Analytics Pipeline

Pinterest Data Pipeline

Sales Data ETL Pipeline

Stock Price Forecasting Models

COVID-19 ICU Admission Prediction

Professional Experience

Publications & Certifications

Published Research

Professional Certifications

GitHub Analytics

Connect

Pinned Loading

Uh oh!