Forecasting & BI

Sales Forecasting

Time series forecasting to support planning, inventory optimisation, and executive decision-making.

This project focuses on demand forecasting across multiple horizons, leveraging classical time series models and machine learning approaches to improve business planning accuracy.

Databricks Platform

This project implements a production-style data platform that enables reliable sales forecasting by transforming raw transactional data into analytics-ready time series datasets. Built on Databricks using a medallion architecture (Bronze, Silver, Gold), the platform emphasises data quality, repeatable pipelines, and downstream forecasting readiness.

Business Objective

Enable reliable sales forecasting and reporting by creating a clean, well-structured data foundation that reduces noise from raw operational systems and aligns metrics with business definitions.

Technical Objective

Design an end-to-end data pipeline that demonstrates best practices in data ingestion, transformation, validation, and modelling using layered data architecture.

Medallion Architecture

Bronze Layer

Raw Ingestion

  • Ingest raw sales order data with minimal transformation
  • Preserve original schema and values
  • Add ingestion timestamps for auditability
  • Designed for traceability and replay

Silver Layer

Cleansed & Standardised

  • Data type standardisation and validation
  • Removal of duplicates and invalid records
  • Business rule enforcement
  • Prepared for analytical consumption

Gold Layer

Business Models

  • Aggregated sales and revenue metrics
  • Analytics-ready fact tables
  • Forecast-friendly time series structure
  • Validated outputs for BI dashboards

Databricks Job Pipeline

The pipeline is orchestrated using Databricks Jobs, with each medallion layer executed as a discrete, dependency-driven task. This approach ensures reliability, observability, and repeatability while enabling easy scheduling and monitoring.

  • Automated execution of Bronze → Silver → Gold layers
  • Task dependencies enforce data quality gates
  • Idempotent transformations for safe re-runs
  • Designed for scheduled and event-driven execution

Each task in the Databricks Job represents a logical stage in the data lifecycle, allowing failures to be isolated, monitored, and rerun independently without impacting downstream consumers.

Databricks Job Pipeline Architecture

Databricks Job Pipeline Architecture

The diagram illustrates the orchestration of Bronze, Silver, and Gold data processing tasks using Databricks Jobs. Each stage is dependency-aware, ensuring data quality, traceability, and repeatable execution across pipeline runs.

Forecasting Readiness

The Gold layer is structured to support time series forecasting by aggregating sales at consistent time intervals, enabling the application of classical forecasting models and machine learning approaches with minimal additional preparation.

Intended Models

Planned forecasting approaches include baseline statistical models (moving averages, ARIMA) and machine learning techniques, evaluated using business-aligned error metrics such as MAE and MAPE.

Tools & Technologies

  • Databricks (Jobs, Notebooks, Workspace)
  • Python (pandas, PySpark-style transformations)
  • Medallion Architecture (Bronze, Silver, Gold)
  • Automated data pipelines
  • Analytics-ready data modelling
  • Time series–friendly data aggregation
  • Forecasting-oriented data design
  • BI & dashboard consumption focus

Outcome & Value

This project demonstrates the ability to design and implement a scalable data foundation that bridges raw operational data and advanced analytics. It highlights practical data engineering skills while directly supporting sales forecasting and executive-level reporting use cases.

Why This Project Matters

Rather than focusing solely on forecasting models, this project demonstrates how reliable predictions depend on robust data foundations. By combining Databricks job orchestration with a layered data architecture, the solution mirrors real-world enterprise data platforms used to support forecasting, planning, and strategic decision-making.

View Full Implementation

Complete source code, documentation, and example notebooks available on GitHub