Benchmarking

The Traverse Bench tool (traverse-bench) downloads datasets, imports data, and runs standardized benchmarks against any Bolt-compatible graph database.

Quick Start

# Download the medium pokec dataset
traverse-bench --download medium

# Import into a running server
traverse-bench --import medium --port 7690

# Run benchmarks
traverse-bench --variant medium --port 7690 --duration 10 --warmup 3

Dataset Variants

Variant	Nodes	Edges	Use Case
`small`	~10K	~100K	Quick smoke tests
`medium`	~100K	~1M	Standard benchmarks
`large`	~1.6M	~30M	Production-scale tests

All variants use the Pokec social network dataset with :User nodes and :Friend edges.

CLI Options

Flag	Description	Default
`--variant <NAME>`	Dataset variant: `small`, `medium`, `large`	`small`
`--duration <SECS>`	Measurement time per query (seconds)	`10`
`--warmup <N>`	Warmup iterations before measurement	`3`
`--concurrency <N>`	Number of parallel workers	`1`
`--groups <LIST>`	Comma-separated query groups or individual query names	All groups
`--list-queries`	List all available queries and groups, then exit	—
`--host <ADDR>`	Bolt server hostname	`127.0.0.1`
`--port <PORT>`	Bolt port	`7690`
`--auth <USER:PASS>`	Authentication credentials	—
`--server-pid <PID>`	Server PID for resource profiling (auto-detected if omitted)	—
`--output, -o <FILE>`	JSON output file path	—
`--import <VARIANT\|FILE>`	Import data and exit (variant name or `.cypher` file path)	—
`--download <VARIANT>`	Download dataset and exit	—
`--format <FMT>`	Download format: `cypher` or `csv`	`cypher`
`--download-dir <DIR>`	Output directory for downloads	`benchmarks/data/dataset_cache`

Download

# Download as Cypher (for pipelined Bolt import)
traverse-bench --download medium

# Download as CSV (for Neo4j native import)
traverse-bench --download large --format csv

# Custom output directory
traverse-bench --download medium --download-dir /tmp/datasets

Import

Import a downloaded dataset (or a custom .cypher file) into a running server via pipelined Bolt (512 statements per batch):

# Import a variant
traverse-bench --import medium --port 7690

# Import a custom file
traverse-bench --import /path/to/data.cypher --port 7690

Indexes are created automatically before data import.

Importing into Neo4j

For Neo4j, the pipelined Bolt import can be very slow on larger datasets (hours for the large variant). Instead, download the CSV format and use neo4j-admin database import for a much faster bulk load:

# Download as CSV files prepared for neo4j-admin
traverse-bench --download large --format csv

# Use neo4j-admin for fast import
neo4j-admin database import full \
  --nodes=User=benchmarks/data/dataset_cache/pokec_large_nodes.csv \
  --relationships=Friend=benchmarks/data/dataset_cache/pokec_large_edges.csv \
  neo4j

The CSV files include the headers required by neo4j-admin (:ID, :START_ID, :END_ID, etc.).

Query Groups

Queries are organized into groups. Use --list-queries to see all available queries.

Group	Queries	Description
`read`	5	Point lookups, property reads, short pattern matches
`expansion`	13	Multi-hop traversals (1–4 hops) with and without filters
`scan`	8	Full scans, scan+expand, scan+aggregate
`aggregate`	4	count, min/max/avg, group by
`shortest_path`	2	shortestPath and allShortestPaths
`update`	1	SET property on matched node
`write`	2	CREATE node and CREATE edge

# Run only read and expansion groups
traverse-bench --variant medium --groups read,expansion

# Run a single query by name
traverse-bench --variant medium --groups single_vertex_read

Output Metrics

Each query produces:

Iterations — total queries executed during measurement
Errors — failed queries
Throughput — queries per second (QPS)
Latency — avg, p50, p95, p99, max (milliseconds)

When --server-pid is provided (or auto-detected), resource profiling tracks CPU and memory usage per query.

Use --output to save results as JSON for comparison:

traverse-bench --variant medium --output results.json

Cross-Database Comparison

Because traverse-bench uses the Bolt protocol, it can benchmark any Bolt-compatible database. Run the same benchmark against different servers by changing --port:

# Benchmark Traverse (port 7690)
traverse-bench --variant medium --port 7690 -o traverse.json

# Benchmark Neo4j (port 7687)
traverse-bench --variant medium --port 7687 -o neo4j.json

# Benchmark Memgraph (port 7689)
traverse-bench --variant medium --port 7689 -o memgraph.json