Skip to content

Experimental framework for approximate computing on CNNs: pruning, quantization, sparsity analysis on ResNet-18 and MobileNetV2.

License

Notifications You must be signed in to change notification settings

YueranCao2001/cnn-approx-compression

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Approximate Computing for CNN Compression

python pytorch task dataset model status license

TL;DR:
This repository implements a clean and reproducible pipeline for studying approximate computing on CNNs, including global pruning, fine-tuning, and dynamic INT8 quantization.
Experiments cover ResNet-18 and MobileNetV2 on CIFAR-10 and CIFAR-100, with analysis of accuracy, model size, latency, and sparsity.
Pruning consistently improves generalization (sometimes +15%), while INT8 quantization preserves accuracy with minimal overhead.


This repository implements an experimental framework to study approximate computing techniques for CNN compression, inspired by Deep Compression (Han et al., ICLR 2016).

We evaluate how pruning, quantization, and model architecture choices affect the trade-off between:

Accuracy

Model size

Inference latency

Layer-wise sparsity

Experiments are conducted on ResNet-18 and MobileNetV2, using CIFAR-10 and CIFAR-100 datasets.


Table of Contents

1. Environment Setup

We recommend using Anaconda.

Create and activate a new environment

conda create -n approx_cnn python=3.9 -y
conda activate approx_cnn

Option A — Install from requirements.txt (recommended for reproducibility)

pip install -r requirements.txt

Note: PyTorch is not included in requirements.txt because CUDA users must install it manually. Install PyTorch separately following the instructions below.

Option B — Manual installation

Install PyTorch

CUDA (if available)

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

CPU version

pip install torch torchvision torchaudio

Extra packages

pip install numpy tqdm
conda install matplotlib -y

Clone or place this repo in your working directory, then move into it:

git clone https://github.com/YueranCao2001/cnn-approx-compression
cd cnn-approx-compression

2. Project Structure

cnn-approx-compression/
├── data/                     # CIFAR-10/100 datasets, generate automatically if you run related code
├── models/                   # Saved .pth checkpoints
├── results/                  # Result plots
├── scripts/                  # Training, pruning, quantization, evaluation
└── README.md

3. Pipeline Overview

The full experimental pipeline is: (1). Train baseline CNN (ResNet-18, MobileNetV2) on CIFAR-10 / CIFAR-100 (2). Apply global unstructured pruning (3). Fine-tune pruned models (4). Optionally apply dynamic INT8 quantization (5). Evaluate accuracy, model size, and latency (6). Summarize and visualize results

Each step corresponds to one script in scripts/

4. Running the Experiments

4.1 Train the Baseline Model

python scripts/train_resnet18_c10_baseline.py

python scripts/train_mobilenetv2_c10_baseline.py

Outputs:

models/resnet18_c10_base.pth

models/mobilenetv2_c10_base.pth

4.2 Prune the Baseline Model

python scripts/prune_resnet18_c10_prune50.py

python scripts/prune_mobilenetv2_c10_prune50.py

Pruning mechanism:

● Global L1 unstructured pruning

● Applied to Conv + Linear layers

● Followed by fine-tuning

● Pruning masks removed before saving final checkpoint

4.3 Quantize the Pruned Model (INT8)

python scripts/quantize_resnet18_c10_int8.py

● Applies dynamic quantization to Linear layers

● Produces: models/resnet18_c10_pruned50_int8.pth

4.4 Evaluation

python scripts/eval_resnet18_c10_all.py

python scripts/eval_mobilenetv2_c10_all.py

python scripts/eval_resnet18_c100_all.py

Metrics include:

● Test accuracy (CPU)

● On-disk model size

● Average inference time per image (CPU)

4.5 Visualization

python scripts/viz_resnet18_c10_results.py

python scripts/viz_mobilenetv2_c10_results.py

python scripts/viz_resnet18_vs_mobilenetv2_c10.py

All images are saved in: results/

5. Results and Analysis

Below is a comprehensive summary of all experiments, structured per model and dataset.

Section A — MobileNetV2 on CIFAR-10

Accuracy vs Model Size

Analysis

● Pruning improves accuracy from 65.3%80.2%, a surprising but well-known effect when pruning removes noisy or redundant weights.

● Model size stays almost unchanged (8.769 → 8.770 MB) because PyTorch stores dense FP32 tensors even after pruning.

Accuracy Comparison

Analysis

● The +15% absolute accuracy jump indicates MobileNetV2 is highly overparameterized for CIFAR-10.

● Pruning forces a form of regularization, helping generalization.

Latency Comparison

Analysis

● Latency slightly decreases (4.922 → 4.818 ms).

● Because PyTorch does not exploit sparsity, the gain is due to reduced effective FLOPs, not sparse kernels.

Model Size Comparison

Analysis

● As expected, size remains unchanged due to dense storage format.

● True compression would require sparse serialization or Huffman coding.

Section B — ResNet-18 on CIFAR-10 (Baseline, Prune 50%, Prune 50% + INT8)

Accuracy vs Size

Analysis

● Baseline accuracy: 74.5%

● Pruned (50%): 81.6%

● Pruned + INT8: 81.6% (no drop)

INT8 quantization preserves pruned performance, because only fully-connected layers are quantized—convolutions dominate compute.

Accuracy Bar

Analysis

● Same trend as above; pruning improves generalization.

● INT8 quantization does not harm accuracy.

Latency Bar

Analysis

● INT8 version incurs slightly higher latency (2.438 → 2.605 ms).

● PyTorch dynamic quantization is CPU-oriented and may add overhead for small models.

Model Size Bar

Analysis

● INT8 slightly reduces state_dict size (42.7314 → 42.7288 MB).

● Again, PyTorch stores dense tensors, so compression is limited.

Section C — ResNet-18 Pruning Sweep (0 / 30 / 50 / 70%)

Accuracy vs Pruning Ratio

Analysis

Accuracy increases monotonically:

● 0% → 74.5%

● 30% → 82.5%

● 50% → 81.6%

● 70% → 83.0% (best)

Indicates strong overparameterization and robustness.

Latency vs Ratio

Analysis

● Latency varies slightly (2.341–2.475 ms) with no consistent trend.

● PyTorch kernels do not accelerate sparse convolutions.

Model Size vs Ratio

Analysis

● All sizes ≈ 42.73 MB, confirming dense storage.

Sparsity CSV Observations

Across 30/50/70% pruning:

● Early layers remain dense

● Deeper 3×3 convolutions gain high sparsity

● FC layer sparsity roughly matches pruning target

This matches standard global pruning dynamics.

Section D — ResNet-18 on CIFAR-100 (Baseline vs Pruned 50%)

Accuracy vs Size

Analysis

● Pruned accuracy dramatically increases: 42.9%56.4%

● Similar to CIFAR-10, pruning removes redundancy.

Accuracy Bar

Latency Bar

Analysis

● Pruned model slightly faster (2.620 → 2.589 ms).

● Variation is small.

Model Size Bar

Analysis

● Dense storage again limits compression.

Section E — ResNet-18 vs MobileNetV2 (CIFAR-10)

Accuracy Comparison

Analysis

Both models gain significant accuracy from pruning:

● ResNet-18: 74.5% → 81.6%

● MobileNetV2: 65.3% → 80.2%

MobileNetV2 catches up to ResNet-18 after pruning.

Latency Comparison

Analysis

● MobileNetV2 is naturally slower on CPU (depthwise separable ops → lower parallelism).

● ResNet-18 remains ~2.4 ms, MobileNetV2 ~4.5 ms.

Model Size Comparison

Analysis

● ResNet-18: ~42.7 MB

● MobileNetV2: ~8.77 MB

● Significant architectural footprint difference.

6. Notes and Limitations

(1). PyTorch stores pruned tensors densely, so file size does not change.

(2). PyTorch dynamic INT8 quantization only affects Linear layers.

(3). No sparse kernels → pruning does not accelerate inference.

(4). True compression (Deep Compression) requires:

● Sparse matrix formats

● Weight sharing

● Huffman coding

7. Conclusion

This project systematically evaluates approximate computing techniques (pruning + quantization) on two CNN architectures and two datasets.

Key findings:

● Moderate-to-high unstructured pruning consistently improves accuracy (regularization effect).

● INT8 quantization preserves accuracy but provides limited compression for CNNs dominated by conv layers.

● Model size remains unchanged without sparse serialization.

● CPU latency shows minimal variation because PyTorch kernels do not exploit sparsity.

● MobileNetV2 benefits even more from pruning compared to ResNet-18.

These findings provide a strong baseline for understanding approximate computing trade-offs in modern CNNs.

8. Citation

If you use this repository or build upon its code or analysis, please cite:

@misc{cao2025approxcnn,
    title  = {Approximate Computing for CNN Compression: A Study of Pruning and INT8 Quantization on ResNet-18 and MobileNetV2},
    author = {Cao, Yueran and Wang, Zinian and Chen, Chang and Li, Tian},
    year   = {2025},
    note   = {Course Project, Georgetown University},
}

This project is also inspired by:

@misc{han2016deepcompressioncompressingdeep,
      title={Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding}, 
      author={Song Han and Huizi Mao and William J. Dally},
      year={2016},
      eprint={1510.00149},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/1510.00149}, 
}

About

Experimental framework for approximate computing on CNNs: pruning, quantization, sparsity analysis on ResNet-18 and MobileNetV2.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages