RetailPulse is a full-stack data engineering project that demonstrates SQL proficiency, ETL pipeline development, and cloud integration. It extracts retail sales data from Kaggle, transforms it using Python, loads it into PostgreSQL, and visualizes results through a Streamlit dashboard. The project is modular, automated, and built for real-world data workflows.
RetailPulse simulates an enterprise-grade data workflow:
- Extract — Ingest open retail data from Kaggle.
- Transform — Clean, validate, and enrich using Python.
- Load — Store processed data in PostgreSQL and upload to Azure storage blob.
- Visualize — Serve analytics via API and Streamlit dashboard.
[Kaggle Dataset] ↓ [Python ETL Pipeline] ↓ [Azure S3 + PostgreSQL (RDS)] ↓ [Flask/FastAPI API Layer] ↓ [Streamlit Dashboard ]
| Layer | Tools & Libraries |
|---|---|
| ETL Pipeline | Python 3.8+, pandas, kaggle, boto3, sqlalchemy, logging |
| Database | PostgreSQL (local or Azure) |
| Cloud Storage | Azure storage blob |
| API Layer | Flask or FastAPI |
| Hosting | Streamlit Cloud |
| Automation | GitHub Actions, Cron jobs |
| Version Control | Git + GitHub |
git clone git@github.com:MmelIGaba/RetailPulse.git
cd RetailPulseCreate a .env file in the /Back-End directory based on .env.example:
DB_HOST=your-db-host
DB_USER=your-username
DB_PASSWORD=your-password
DB_NAME=retailpulse
Azure_ACCESS_KEY=your-access-key
Azure_SECRET_KEY=your-secret-key
pip install -r Back-End/requirements.txt
pip install -r dashboard/requirements.txt
cd Back-End/etl
python extract.py
python transform.py
python load.py
cd dashboard
streamlit run app.py
cd Front-End
npm run dev
RetailPulse/
├── Back-End/
│ ├── etl/
│ ├── sql/
│ ├── requirements.txt
│ └── .env.example
├── dashboard/
│ ├── app.py
│ ├── components/
│ └── requirements.txt
├── .github/
│ └── workflows/
├── presentation/
│ └── RetailPulse_Slides.pdf
├── .gitignore
└── README.md
All credentials are managed via environment variables. Do not commit real credentials or API keys. Use .env files locally and GitHub Secrets for automation.
By completing this project, you will:
Write and optimize SQL queries for analysis.
Design modular ETL pipelines in Python.
Integrate Azure storage and PostgreSQL.
Automate data workflows using GitHub Actions.
Visualize metrics with modern frontend tools.
This project is licensed under the Apache License 2.0.
- Mmela Gabriel Dyantyi: Fullstack Developer and Aspiring Cloud Engineer
- Boipelo M Ngakane: Frontend Developer | Low Code | AI | Cloud Engineer
- [ammend as needed]