This is the root repository for MICRO'58 accepted paper X-SET: An Efficient Graph Pattern Matching Accelerator With Order-Aware Parallel Intersection Units. Please clone this repo with its submodules for RTL and simulator source code.
- For generating Verilog RTL and running SystemC simulator:
- Docker engine (or other container runtime like Podman)
- For synthesis and area/power analysis:
- Synopsys Design Compiler V-2023.12
- Synopsys Library Compiler V-2023.12
- TSMC 28nm PDK with SRAM compiler
We needs to build the Docker images for RTL generation and SystemC simulation.
> cd rtl
> docker build -t micro58-xset-ae-rtl:v1.0 .
> cd ../simulator
> docker build -t micro58-xset-ae-sim:v1.0 .
> cd ..If you face any issues with the Docker build (e.g. slow network), you can also pull from DockerHub instead:
> docker pull xsun2001/micro58-xset-ae-rtl:v1.0
> docker pull xsun2001/micro58-xset-ae-sim:v1.0X-SET hardware design is implemented in Chisel and can be synthesized to Verilog RTL for area and power analysis using Synopsys Design Compiler and TSMC 28nm PDK. The tables and figures can be reproduced in this section are:
- (Table 4) Area and Power analysis of XSet hardware design
- (Figure 15) Area and Power breakdown of Order-Aware SIU and Systolic Merge Array for different segment length
Choose the configuration of XSet design you want to evaluate.
- (Table 4) Area and Power analysis of XSet hardware design:
XsetDefault - (Figure 15) Area and Power breakdown of Order-Aware SIU and Systolic Merge Array for different segment length
XsetS2XsetS4XsetS8XsetS16For systolic merge array SIU with segment length 2, 4, 8 and 16.XsetB2XsetB4XsetB8XsetB16For our order-aware bitonic SIU with segment length 2, 4, 8 and 16.
To use different configurations, please change the three appearence of XsetDefault in the following commands to the desired configuration name.
> docker run -it --name xset-rtl-container micro58-xset-ae-rtl:v1.0 bash
# If the container is running, use:
# docker exec -it xset-rtl-container bash
$ cd sims/verilator/
$ make verilog CONFIG=XsetDefault -j$(nproc)
$ exit
> docker cp xset-rtl-container:/opt/chipyard/sims/verilator/generated-src/chipyard.harness.TestHarness.XsetDefault/gen-collateral ./rtl-out/XsetDefaultTo use different configurations, please change the value of export CHIPYARD_TARGET= in vlsi/env.sh.
If you only want to run synthesis for SIU itself (Figure 15), change export TOP_MODULE= to "Sxu". Otherwise, if you want to run synthesis for the whole XSet design (Table 4), change it to "XsetAccelerator".
# 1. Setup environment
> cd vlsi
# Modify env.sh if desired
> . env.sh
# 2. Generate SRAM macros using TSMC SRAM compiler and Synopsys Library Compiler
> python auto-sram.py $VERILOG_DIR/metadata/seq_mems.json $SRAM_GEN_DIR ../tsmc-mc
# 3. Run synthesis
> run_synth.shNow in vlsi/reports/<Config> directory, there're area.rpt power.rpt and timing.rpt files that contain area, power and timing reports of the synthesized design. There're also log files, intermediate files and generated post-synthesis netlist in the same directory.
Note you may only need to run the SRAM compiler once if you do not change the SRAM or Scratchpad configuration. You only need to change env.sh, re-source it and execute run_synth.sh.
We built a cycle-accurate SystemC simulator for XSet, which is used for end-to-end performance evaluation on various patterns and real-world graphs. The figures can be reproduced by simulator are:
- (Figure 12, 13) Full system performance on graph-pattern combinations.
- (Figure 14) Performance of simple merge-based SIU and SMA normalized to order-aware SIU
- (Figure 16) Ablation analysis of different SIU and scheduler combinations.
- (Figure 17a) Scalability analysis of different numbers of PEs.
- (Figure 17b) Scalability analysis of different number of SIUs per PE.
- (Figure 18a) Sensitivity analysis of private cache size.
- (Figure 18b) Sensitivity analysis of shared cache size.
- (Figure 19) Sensitivity analysis of bitmap width of BitmapCSR format.
First, please download the graph datasets from Google Drive to ./graphs directory and decompress it. Then, we need to run a Docker container with the graph datasets mounted on /opt/graphs and enter the container's shell:
> docker run -it -v ./graphs/:/opt/graphs/ --name xset-sim micro58-xset-ae-sim:v1.0 bash
> docker exec -it xset-sim bash/opt/xset contains the source code and prebuilt binary of XSet simulator. /opt/xset/benchmarks contains all the benchmark suites organized by figures to reproduce. In each fig<id>.sh script file, it defines the graphs and patterns to run, and then search for configuration files like fig<id>-*.toml, then use GNU parallel to run all the benchmark cases in parallel. Additionaly, you can set MAX_PROCS environment variable to limit the number of parallel processes like MAX_PROCS=8 ./fig12.sh, if you want to run multiple benchmark suites at the same time.
It should be noticed that fig16-ablation-xy.toml: x: SIU type (0: Ours, 1: SMA: 2: Merge), y: scheduler type (0: Ours, 1: PDFS, 2: DFS)
After all benchmark cases are finished, you can use ./script/extractor.ipynb and ./script/extractor.py to extract the elapsed time or other metrics from the log files. The extraction code is like:
from extractor import *
data = read_benchmarks("fig14")
d = extract_exps(data, ['siu-ours', 'siu-simple', 'siu-sma'], columns=['time'])
d.to_csv("fig14.csv", index=False)Then use ./script/draw.ipynb (Most of figures) and ./script/draw_scale.ipynb (Fig 14, 15, 17a) to draw the figures. Normally, we import the csv file to Excel and perform post-processing like calculate speedup, normalization, etc.. Then copy the data directly from Excel to the raw string of corresponding notebook code cells. The parsing of pasted data and drawing of figures are done automatically in the notebook.