Skip to content

RetailPulse: SQL & ETL Data Pipeline Builder A full-stack data engineering project that extracts retail data from Kaggle, transforms it with Python, loads it into PostgreSQL, and visualizes insights via a React dashboard. Features cloud integration (AWS S3), automation, and end-to-end ETL orchestration.

License

Notifications You must be signed in to change notification settings

MmelIGaba/RetailPulse

Repository files navigation

RetailPulse : SQL & ETL Data Pipeline Builder

RetailPulse is a full-stack data engineering project that demonstrates SQL proficiency, ETL pipeline development, and cloud integration. It extracts retail sales data from Kaggle, transforms it using Python, loads it into PostgreSQL, and visualizes results through a Streamlit dashboard. The project is modular, automated, and built for real-world data workflows.

Project Overview

RetailPulse simulates an enterprise-grade data workflow:

  1. Extract — Ingest open retail data from Kaggle.
  2. Transform — Clean, validate, and enrich using Python.
  3. Load — Store processed data in PostgreSQL and upload to Azure storage blob.
  4. Visualize — Serve analytics via API and Streamlit dashboard.

Architecture

[Kaggle Dataset] ↓ [Python ETL Pipeline] ↓ [Azure S3 + PostgreSQL (RDS)] ↓ [Flask/FastAPI API Layer] ↓ [Streamlit Dashboard ]


Technologies Used

Layer Tools & Libraries
ETL Pipeline Python 3.8+, pandas, kaggle, boto3, sqlalchemy, logging
Database PostgreSQL (local or Azure)
Cloud Storage Azure storage blob
API Layer Flask or FastAPI
Hosting Streamlit Cloud
Automation GitHub Actions, Cron jobs
Version Control Git + GitHub

Setup Instructions

1. Clone the Repository

git clone git@github.com:MmelIGaba/RetailPulse.git
cd RetailPulse

Configure Environment

Create a .env file in the /Back-End directory based on .env.example:

DB_HOST=your-db-host
DB_USER=your-username
DB_PASSWORD=your-password
DB_NAME=retailpulse
Azure_ACCESS_KEY=your-access-key
Azure_SECRET_KEY=your-secret-key

3. Install Dependencies

pip install -r Back-End/requirements.txt
pip install -r dashboard/requirements.txt

4. Run the ETL Pipeline

cd Back-End/etl
python extract.py
python transform.py
python load.py

5 launch the Streamlit Dashboard

cd dashboard
streamlit run app.py

6. Launch the Dashboard

cd Front-End
npm run dev

Repository Structure

RetailPulse/
├── Back-End/
│   ├── etl/
│   ├── sql/
│   ├── requirements.txt
│   └── .env.example
├── dashboard/
│   ├── app.py
│   ├── components/
│   └── requirements.txt
├── .github/
│   └── workflows/
├── presentation/
│   └── RetailPulse_Slides.pdf
├── .gitignore
└── README.md

Security & Configuration

All credentials are managed via environment variables. Do not commit real credentials or API keys. Use .env files locally and GitHub Secrets for automation.

Learning Outcomes

By completing this project, you will:

Write and optimize SQL queries for analysis.

Design modular ETL pipelines in Python.

Integrate Azure storage and PostgreSQL.

Automate data workflows using GitHub Actions.

Visualize metrics with modern frontend tools.

License

This project is licensed under the Apache License 2.0.

Author

  1. Mmela Gabriel Dyantyi: Fullstack Developer and Aspiring Cloud Engineer
  2. Boipelo M Ngakane: Frontend Developer | Low Code | AI | Cloud Engineer
  3. [ammend as needed]

About

RetailPulse: SQL & ETL Data Pipeline Builder A full-stack data engineering project that extracts retail data from Kaggle, transforms it with Python, loads it into PostgreSQL, and visualizes insights via a React dashboard. Features cloud integration (AWS S3), automation, and end-to-end ETL orchestration.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages