Forecasting & BI
Time series forecasting to support planning, inventory optimisation, and executive decision-making.
This project focuses on demand forecasting across multiple horizons, leveraging classical time series models and machine learning approaches to improve business planning accuracy.
Databricks Platform
This project implements a production-style data platform that enables reliable sales forecasting by transforming raw transactional data into analytics-ready time series datasets. Built on Databricks using a medallion architecture (Bronze, Silver, Gold), the platform emphasises data quality, repeatable pipelines, and downstream forecasting readiness.
Enable reliable sales forecasting and reporting by creating a clean, well-structured data foundation that reduces noise from raw operational systems and aligns metrics with business definitions.
Design an end-to-end data pipeline that demonstrates best practices in data ingestion, transformation, validation, and modelling using layered data architecture.
Bronze Layer
Silver Layer
Gold Layer
The pipeline is orchestrated using Databricks Jobs, with each medallion layer executed as a discrete, dependency-driven task. This approach ensures reliability, observability, and repeatability while enabling easy scheduling and monitoring.
Each task in the Databricks Job represents a logical stage in the data lifecycle, allowing failures to be isolated, monitored, and rerun independently without impacting downstream consumers.
The diagram illustrates the orchestration of Bronze, Silver, and Gold data processing tasks using Databricks Jobs. Each stage is dependency-aware, ensuring data quality, traceability, and repeatable execution across pipeline runs.
The Gold layer is structured to support time series forecasting by aggregating sales at consistent time intervals, enabling the application of classical forecasting models and machine learning approaches with minimal additional preparation.
Planned forecasting approaches include baseline statistical models (moving averages, ARIMA) and machine learning techniques, evaluated using business-aligned error metrics such as MAE and MAPE.
This project demonstrates the ability to design and implement a scalable data foundation that bridges raw operational data and advanced analytics. It highlights practical data engineering skills while directly supporting sales forecasting and executive-level reporting use cases.
Rather than focusing solely on forecasting models, this project demonstrates how reliable predictions depend on robust data foundations. By combining Databricks job orchestration with a layered data architecture, the solution mirrors real-world enterprise data platforms used to support forecasting, planning, and strategic decision-making.
Complete source code, documentation, and example notebooks available on GitHub