X-SET: An Efficient Graph Pattern Matching Accelerator With Order-Aware Parallel Intersection Units

This is the root repository for MICRO'58 accepted paper X-SET: An Efficient Graph Pattern Matching Accelerator With Order-Aware Parallel Intersection Units. Please clone this repo with its submodules for RTL and simulator source code.

Pre-requisites

For generating Verilog RTL and running SystemC simulator:
- Docker engine (or other container runtime like Podman)
For synthesis and area/power analysis:
- Synopsys Design Compiler V-2023.12
- Synopsys Library Compiler V-2023.12
- TSMC 28nm PDK with SRAM compiler

Setup and Build

We needs to build the Docker images for RTL generation and SystemC simulation.

> cd rtl
> docker build -t micro58-xset-ae-rtl:v1.0 .
> cd ../simulator
> docker build -t micro58-xset-ae-sim:v1.0 .
> cd ..

If you face any issues with the Docker build (e.g. slow network), you can also pull from DockerHub instead:

> docker pull xsun2001/micro58-xset-ae-rtl:v1.0
> docker pull xsun2001/micro58-xset-ae-sim:v1.0

Area and Power evalution

X-SET hardware design is implemented in Chisel and can be synthesized to Verilog RTL for area and power analysis using Synopsys Design Compiler and TSMC 28nm PDK. The tables and figures can be reproduced in this section are:

(Table 4) Area and Power analysis of XSet hardware design
(Figure 15) Area and Power breakdown of Order-Aware SIU and Systolic Merge Array for different segment length

Design configuration

Choose the configuration of XSet design you want to evaluate.

(Table 4) Area and Power analysis of XSet hardware design: XsetDefault
(Figure 15) Area and Power breakdown of Order-Aware SIU and Systolic Merge Array for different segment length
1. XsetS2 XsetS4 XsetS8 XsetS16 For systolic merge array SIU with segment length 2, 4, 8 and 16.
2. XsetB2 XsetB4 XsetB8 XsetB16 For our order-aware bitonic SIU with segment length 2, 4, 8 and 16.

Generate RTL

To use different configurations, please change the three appearence of XsetDefault in the following commands to the desired configuration name.

> docker run -it --name xset-rtl-container micro58-xset-ae-rtl:v1.0 bash
# If the container is running, use:
# docker exec -it xset-rtl-container bash
$ cd sims/verilator/
$ make verilog CONFIG=XsetDefault -j$(nproc)
$ exit
> docker cp xset-rtl-container:/opt/chipyard/sims/verilator/generated-src/chipyard.harness.TestHarness.XsetDefault/gen-collateral ./rtl-out/XsetDefault

Synthesis

To use different configurations, please change the value of export CHIPYARD_TARGET= in vlsi/env.sh.

If you only want to run synthesis for SIU itself (Figure 15), change export TOP_MODULE= to "Sxu". Otherwise, if you want to run synthesis for the whole XSet design (Table 4), change it to "XsetAccelerator".

# 1. Setup environment
> cd vlsi
# Modify env.sh if desired
> . env.sh
# 2. Generate SRAM macros using TSMC SRAM compiler and Synopsys Library Compiler
> python auto-sram.py $VERILOG_DIR/metadata/seq_mems.json $SRAM_GEN_DIR ../tsmc-mc
# 3. Run synthesis
> run_synth.sh

Now in vlsi/reports/<Config> directory, there're area.rpt power.rpt and timing.rpt files that contain area, power and timing reports of the synthesized design. There're also log files, intermediate files and generated post-synthesis netlist in the same directory.

Note you may only need to run the SRAM compiler once if you do not change the SRAM or Scratchpad configuration. You only need to change env.sh, re-source it and execute run_synth.sh.

End-to-End Performance Evaluation

We built a cycle-accurate SystemC simulator for XSet, which is used for end-to-end performance evaluation on various patterns and real-world graphs. The figures can be reproduced by simulator are:

(Figure 12, 13) Full system performance on graph-pattern combinations.
(Figure 14) Performance of simple merge-based SIU and SMA normalized to order-aware SIU
(Figure 16) Ablation analysis of different SIU and scheduler combinations.
(Figure 17a) Scalability analysis of different numbers of PEs.
(Figure 17b) Scalability analysis of different number of SIUs per PE.
(Figure 18a) Sensitivity analysis of private cache size.
(Figure 18b) Sensitivity analysis of shared cache size.
(Figure 19) Sensitivity analysis of bitmap width of BitmapCSR format.

Run the simulator

First, please download the graph datasets from Google Drive to ./graphs directory and decompress it. Then, we need to run a Docker container with the graph datasets mounted on /opt/graphs and enter the container's shell:

> docker run -it -v ./graphs/:/opt/graphs/ --name xset-sim micro58-xset-ae-sim:v1.0 bash
> docker exec -it xset-sim bash

/opt/xset contains the source code and prebuilt binary of XSet simulator. /opt/xset/benchmarks contains all the benchmark suites organized by figures to reproduce. In each fig<id>.sh script file, it defines the graphs and patterns to run, and then search for configuration files like fig<id>-*.toml, then use GNU parallel to run all the benchmark cases in parallel. Additionaly, you can set MAX_PROCS environment variable to limit the number of parallel processes like MAX_PROCS=8 ./fig12.sh, if you want to run multiple benchmark suites at the same time.

It should be noticed that fig16-ablation-xy.toml: x: SIU type (0: Ours, 1: SMA: 2: Merge), y: scheduler type (0: Ours, 1: PDFS, 2: DFS)

Results collection and plotting

After all benchmark cases are finished, you can use ./script/extractor.ipynb and ./script/extractor.py to extract the elapsed time or other metrics from the log files. The extraction code is like:

from extractor import *

data = read_benchmarks("fig14")
d = extract_exps(data, ['siu-ours', 'siu-simple', 'siu-sma'], columns=['time'])
d.to_csv("fig14.csv", index=False)

Then use ./script/draw.ipynb (Most of figures) and ./script/draw_scale.ipynb (Fig 14, 15, 17a) to draw the figures. Normally, we import the csv file to Excel and perform post-processing like calculate speedup, normalization, etc.. Then copy the data directly from Excel to the raw string of corresponding notebook code cells. The parsing of pasted data and drawing of figures are done automatically in the notebook.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
graphs		graphs
rtl @ c1abdc7		rtl @ c1abdc7
rtl-out		rtl-out
simulator @ 877a042		simulator @ 877a042
vlsi		vlsi
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

X-SET: An Efficient Graph Pattern Matching Accelerator With Order-Aware Parallel Intersection Units

Pre-requisites

Setup and Build

Area and Power evalution

Design configuration

Generate RTL

Synthesis

End-to-End Performance Evaluation

Run the simulator

Results collection and plotting

About

Uh oh!

Releases

Packages

Languages

CLab-HKUST-GZ/micro58-xset

Folders and files

Latest commit

History

Repository files navigation

X-SET: An Efficient Graph Pattern Matching Accelerator With Order-Aware Parallel Intersection Units

Pre-requisites

Setup and Build

Area and Power evalution

Design configuration

Generate RTL

Synthesis

End-to-End Performance Evaluation

Run the simulator

Results collection and plotting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages