Graph Data Science
Traverse ships with a built-in library of 35+ graph algorithms covering centrality, community detection, pathfinding, similarity, link prediction, and node embeddings. Every algorithm is callable from Cypher with a single CALL statement.
How to call an algorithm
CALL traverse.<algorithm>.<mode>(<config>) YIELD ...
Three pieces:
<algorithm>– the algorithm name (camelCase), e.g.pageRank,dijkstra,louvain.<mode>– how the result is delivered (see Modes).<config>– a map of options the algorithm understands.
Worked example: PageRank
CALL traverse.pageRank.stream({maxIterations: 20})
YIELD nodeId, score
RETURN nodeId, score
ORDER BY score DESC
LIMIT 10
Returns one row per node with its computed score. The ORDER BY ... LIMIT is a regular Cypher tail; the algorithm just produces the stream.
Modes
| Mode | Returns | Use when |
|---|---|---|
stream | One row per result entity (node, pair, path). | You want to inspect or post-process individual scores. |
stats | A single summary row. | You want aggregate measures (mean, max, community count) without materializing every score. |
write | A single summary row, plus a write-back of the result to a node property. | You want the algorithm output to be queryable as part of the graph afterwards (e.g. n.pagerank). |
Not every algorithm supports every mode. The catalog tables below show which are available per algorithm.
Stream
CALL traverse.pageRank.stream({maxIterations: 20})
YIELD nodeId, score
RETURN nodeId, score LIMIT 10
Stats
CALL traverse.pageRank.stats({maxIterations: 20})
YIELD nodeCount, min, max, mean, computeMillis
RETURN nodeCount, min, max, mean, computeMillis
Write
CALL traverse.pageRank.write({maxIterations: 20, writeProperty: 'pagerank'})
YIELD nodePropertiesWritten
RETURN nodePropertiesWritten
After running the write mode, the score is queryable like any other property:
MATCH (n:User) RETURN n.name, n.pagerank ORDER BY n.pagerank DESC LIMIT 10
Common config
Most algorithms accept this set of generic options in addition to their algorithm-specific config:
| Option | Type | Description |
|---|---|---|
nodeLabels | list of strings | Restrict the algorithm to nodes carrying any of these labels. Empty / omitted = all nodes. |
relationshipTypes | list of strings | Restrict the algorithm to edges of these types. Empty / omitted = all edges. |
orientation | NATURAL / REVERSE / UNDIRECTED | How edge direction is treated. Default NATURAL. |
writeProperty | string | Required in write mode. Name of the property to set on each affected node. |
Algorithm catalog
Centrality
Score nodes by how "important" they are in the graph.
| Name | Display | What it computes | Modes |
|---|---|---|---|
pageRank | PageRank | Rank by inbound influence following a damped random walk. | stream / stats / write |
articleRank | Article Rank | PageRank variant penalizing high-out-degree neighbors. | stream / stats / write |
eigenvector | Eigenvector Centrality | Power-iteration on the adjacency matrix. | stream / stats / write |
closeness | Closeness Centrality | Inverse of mean shortest-path distance to all other nodes. | stream / stats / write |
harmonicCloseness | Harmonic Closeness | Closeness variant that handles disconnected graphs. | stream / stats / write |
betweenness | Betweenness Centrality | Fraction of shortest paths each node lies on (Brandes). | stream / stats / write |
hits | HITS (Hubs and Authorities) | Two scores per node: hub and authority. | stream / stats / write |
degree | Degree Centrality | Count of incident edges per node. | stream / stats / write |
celf | CELF Influence Maximization | Greedy seed-set selection under the linear-threshold model. | stream / stats |
Community detection
Group nodes into communities based on connectivity.
| Name | Display | What it computes | Modes |
|---|---|---|---|
louvain | Louvain | Modularity-maximizing community detection (multilevel). | stream / stats / write |
leiden | Leiden | Louvain refinement that guarantees well-connected communities. | stream / stats / write |
labelPropagation | Label Propagation | Iterative neighbor-majority label assignment. | stream / stats / write |
wcc | Weakly Connected Components | Component id ignoring edge direction. | stream / stats / write |
scc | Strongly Connected Components | Component id respecting edge direction (Tarjan). | stream / stats / write |
kCore | K-Core Decomposition | Per-node k-core number. | stream / stats / write |
triangleCount | Triangle Count + LCC | Triangle count per node and local clustering coefficient. | stream / stats / write |
conductance | Conductance | Per-community conductance, given an existing community assignment. | stats |
modularity | Modularity Score | Single modularity number, given an existing community assignment. | stats |
Paths and traversal
Find shortest paths, multi-hop traversals, and tree structures.
| Name | Display | What it computes | Modes |
|---|---|---|---|
dijkstra | Dijkstra Shortest Path | Single-source shortest paths with non-negative weights. | stream / stats |
aStar | A* Shortest Path | Heuristic-guided shortest path between two nodes. | stream / stats |
bellmanFord | Bellman-Ford | Single-source shortest paths supporting negative weights. | stream / stats |
yens | Yen's K-Shortest Paths | The k loop-less shortest paths between two nodes. | stream / stats |
allShortestPaths | All-Pairs Shortest Paths | Shortest path between every reachable pair of source-set nodes. | stream / stats |
deltaStepping | Δ-Stepping | Parallel single-source shortest paths. | stream / stats |
bfs | Breadth-First Search | BFS traversal from a source node. | stream / stats |
dfs | Depth-First Search | DFS traversal from a source node. | stream / stats |
randomWalk | Random Walk | Sample weighted random walks of fixed length from each source. | stream / stats |
spanningTree | Spanning Tree (Kruskal) | Minimum spanning tree of the graph. | stream / stats |
steinerTree | Steiner Tree | Minimum-weight tree connecting a set of terminal nodes (2-approx). | stream / stats |
DAG
Algorithms that require a directed acyclic graph.
| Name | Display | What it computes | Modes |
|---|---|---|---|
topologicalSort | Topological Sort | Linear ordering of nodes respecting edge direction. | stream / stats / write |
longestPath | Longest Path (DAG) | Per-node longest path length from any source. | stream / stats / write |
Similarity
Score how similar two nodes are.
| Name | Display | What it computes | Modes |
|---|---|---|---|
nodeSimilarity | Node Similarity (Jaccard) | Pairs of nodes ranked by Jaccard over their neighborhoods. | stream / stats |
cosineSimilarity | Cosine Similarity | Pairs ranked by cosine over neighbor-membership vectors. |N(u)∩N(v)| / sqrt(|N(u)|·|N(v)|). | stream / stats |
knn | k-Nearest Neighbors (structural) | Top-k Jaccard-similar nodes per source, evaluated over the two-hop candidate set. | stream / stats |
knnProperty | k-Nearest Neighbors (by node properties) | Top-k by Euclidean similarity over numeric node-property vectors (e.g. embeddings). NN-Descent random sampling, O(n·K²·d) per iteration. Matches Neo4j's gds.knn semantics. | stream / stats |
Link prediction
Score node pairs by how likely an edge between them is.
| Name | Display | What it computes | Modes |
|---|---|---|---|
commonNeighbors | Common Neighbors | Count of shared neighbors. | stream |
totalNeighbors | Total Neighbors | Union of neighborhoods. | stream |
preferentialAttachment | Preferential Attachment | Product of node degrees. | stream |
adamicAdar | Adamic-Adar | Common Neighbors weighted by inverse log-degree. | stream |
resourceAllocation | Resource Allocation | Common Neighbors weighted by inverse degree. | stream |
sameCommunity | Same Community | 1 if both nodes share a community assignment, else 0. | stream |
Embeddings
Produce a dense or sparse vector per node, useful as input to ML models or downstream similarity / kNN.
| Name | Display | What it computes | Modes |
|---|---|---|---|
fastRP | FastRP | Random-projection embeddings (very fast, surprisingly strong). | stream / stats / write |
node2vec | Node2Vec | Biased random walks + skip-gram (Grover-Leskovec). | stream / stats / write |
hashGnn | HashGNN | Hash-based message passing producing binary embeddings. | stream / stats / write |
graphSage | GraphSAGE | Inductive message-passing embedding (mean aggregator). | stream / stats / write |
Worked examples
1. Top-10 most central airports
Using the OpenFlights sample (airports + routes):
CALL traverse.pageRank.stream({maxIterations: 30})
YIELD nodeId, score
MATCH (a:Airport) WHERE id(a) = nodeId
RETURN a.iata_code, a.name, score
ORDER BY score DESC LIMIT 10
2. Shortest flight path from JFK to LAX
MATCH (src:Airport {iata_code:'JFK'}), (dst:Airport {iata_code:'LAX'})
CALL traverse.dijkstra.stream({
sourceNodeId: id(src),
targetNodeId: id(dst)
})
YIELD nodes, costs
RETURN nodes, costs
3. Detect communities in a social graph and write them back
Using the Pokec sample:
CALL traverse.louvain.write({
writeProperty: 'community',
maxIterations: 10
})
YIELD nodeCount, communityCount, largestCommunitySize, computeMillis
RETURN nodeCount, communityCount, largestCommunitySize, computeMillis
Afterwards the result is just a node property:
MATCH (u:User) RETURN u.community, count(*) AS members
ORDER BY members DESC LIMIT 10
4. Generate FastRP embeddings as a node property
CALL traverse.fastRP.write({
writeProperty: 'embedding',
embeddingDimension: 128
})
YIELD nodeCount, dimension, nodePropertiesWritten
RETURN nodeCount, dimension, nodePropertiesWritten
Afterwards the vector is just another node property:
MATCH (n) RETURN id(n), n.embedding LIMIT 5
5. Find structurally-similar users via kNN
The structural knn ranks each node's neighbors by Jaccard overlap of their two-hop neighborhoods — useful for "people who follow similar accounts" style recommendations:
CALL traverse.knn.stream({topK: 5, similarityCutoff: 0.1})
YIELD node1, node2, similarity
RETURN node1, node2, similarity
ORDER BY similarity DESC LIMIT 20
6. Embedding-space recommendations via FastRP + knnProperty
Step 1 writes a 128-dim FastRP embedding to each node, step 2 finds the top-5 nearest neighbors in that vector space. This matches the canonical Neo4j GDS feature-similarity pipeline:
// 1. Generate 128-dim embeddings
CALL traverse.fastRP.write({
writeProperty: 'embedding',
embeddingDimension: 128
})
YIELD nodeCount, dimension, nodePropertiesWritten
RETURN nodeCount, dimension, nodePropertiesWritten;
// 2. Top-5 similar nodes in embedding space
CALL traverse.knnProperty.stream({
nodeProperties: ['embedding'],
topK: 5,
similarityCutoff: 0.7
})
YIELD node1, node2, similarity
RETURN node1, node2, similarity
ORDER BY similarity DESC LIMIT 20
For tabular features (no embedding step) pass the raw numeric properties instead: nodeProperties: ['age', 'tenure_months', 'avg_session_minutes'].
Stability
Each algorithm carries a stability tag visible in Studio's GDS drawer:
- stable – API and output format are settled; safe to depend on.
- beta – works correctly but the parameter set or output shape may evolve.
- experimental – subject to change without notice.
Studio
The GDS drawer in Studio exposes the same catalog with a form-driven config builder. It writes the same CALL traverse.<algo>.<mode>(...) Cypher you'd write by hand, so anything you do in the drawer is portable to scripts and applications.
See also
- Cypher reference – the language algorithms are invoked from.
- Studio – visual GDS workflow.
- Browser (WASM) – the same algorithms run unchanged in the browser.