Vector Search
Torque supports approximate nearest neighbor (ANN) vector search using HNSW graph indexing with RaBitQ binary quantization. Combine vector search with full-text search for hybrid retrieval via reciprocal rank fusion.
Schema Setup
Add a vector field to your collection using the float[] type with num_dim set to the number of embedding dimensions:
{
"name": "articles",
"fields": [
{"name": "title", "type": "string"},
{"name": "body", "type": "string"},
{"name": "embedding", "type": "float[]", "num_dim": 768}
]
}
Tip: Common embedding dimensions are 384 (MiniLM), 768 (BERT/E5), 1024 (Cohere), 1536 (OpenAI text-embedding-3-small), and 3072 (text-embedding-3-large). Torque handles all sizes.
Distance Metrics
Set the distance metric with the vec_dist field option at schema creation time:
| Value | Metric | Best For |
|---|---|---|
cosine (default) | Cosine similarity | Normalized embeddings (OpenAI, Cohere, most models). Measures angle between vectors. |
ip | Inner product (dot product) | Models where vector magnitude carries meaning. Higher value = more similar. |
l2 | Euclidean distance (L2) | Spatial data, image features. Measures straight-line distance. Lower = more similar. |
{"name": "embedding", "type": "float[]", "num_dim": 768, "vec_dist": "cosine"}
Torque automatically normalizes vectors for cosine distance at index time.
Indexing Documents
Include the embedding vector when importing documents. The vector must have exactly num_dim elements:
{"id": "1", "title": "Machine Learning Basics", "body": "...", "embedding": [0.12, -0.34, 0.56, ...]}
{"id": "2", "title": "Deep Learning Guide", "body": "...", "embedding": [0.23, 0.45, -0.67, ...]}
Note: Torque does not generate embeddings. Your application must compute embeddings before indexing, using your preferred model (OpenAI, Sentence Transformers, Cohere, etc.). This gives you full control over the embedding pipeline.
Vector Search
Use the vector_query parameter to find the nearest neighbors to a query vector:
GET /collections/articles/documents/search?\
q=*&query_by=title&\
vector_query=embedding:([0.12, -0.34, 0.56, ...], k:10)
X-TYPESENSE-API-KEY: YOUR_API_KEY
Parameters
All parameters are specified inside the vector_query string:
| Parameter | Default | Description |
|---|---|---|
k | 100 | Number of nearest neighbors to return. Range: 1–1000. |
distance_threshold | — | Maximum distance cutoff. Results beyond this threshold are excluded. For cosine/dot product, higher = more similar so results below the threshold are excluded. For L2, lower = more similar so results above the threshold are excluded. |
ef | max(k, 100) | HNSW search beam width. Higher values improve recall at the cost of speed. |
flat_search_cutoff | — | When the pre-filtered candidate set is smaller than this value, bypass HNSW and use brute-force distance computation for exact results. |
id | — | Document ID for “find similar” queries. The document’s vector is looked up from the index and used as the query vector. Replaces the vector array. |
alpha | — | Hybrid search weight (0.0–1.0). When set, uses score interpolation instead of RRF: alpha × vector + (1−alpha) × text. Only applies when combined with a text query. |
# Basic vector search
vector_query=embedding:([0.12, -0.34, ...], k:10)
# With distance threshold and increased recall
vector_query=embedding:([0.12, -0.34, ...], k:20, distance_threshold:0.5, ef:200)
# Brute-force for small filtered sets
vector_query=embedding:([0.12, -0.34, ...], k:10, flat_search_cutoff:500)
Set q=* for pure vector search (no text matching). The response includes matching documents ranked by vector similarity, with each hit containing a vector_distance field.
Find Similar Documents
Use id: instead of a vector array to find documents similar to an existing document. Torque looks up the document’s vector from the index:
vector_query=embedding:(id:doc_123, k:10)
This is useful for “more like this” recommendations without needing to compute or send the query vector from your application.
Hybrid Search
Combine full-text BM25F search with vector similarity in a single query. By default, Torque uses reciprocal rank fusion (RRF) to merge both result lists:
GET /collections/articles/documents/search?\
q=machine learning&query_by=title,body&\
vector_query=embedding:([0.12, -0.34, 0.56, ...], k:20)
X-TYPESENSE-API-KEY: YOUR_API_KEY
RRF works by assigning each document a score based on its rank in each result list, then summing the scores. Documents that rank well in both text and vector results are promoted. This gives you keyword precision and semantic recall in one query without tuning a weight parameter.
Tip: Set k higher than your desired per_page so RRF has enough candidates from each side. For example, use k:50 with per_page=10.
Alpha Weighting
For finer control over the text vs. vector balance, set the alpha parameter. When alpha is set, Torque uses score interpolation instead of RRF:
# 70% vector weight, 30% text weight
vector_query=embedding:([0.12, -0.34, ...], k:20, alpha:0.7)
# 30% vector, 70% text
vector_query=embedding:([0.12, -0.34, ...], k:20, alpha:0.3)
Scores from both text search and vector search are normalized to [0, 1] before interpolation, so the alpha value directly controls the balance regardless of the raw score scales.
| alpha | Behavior |
|---|---|
0.0 | Pure text search (vector results ignored) |
0.5 | Equal weight to text and vector |
1.0 | Pure vector search (text results ignored) |
| Not set | Default: reciprocal rank fusion (RRF) |
Sorting by Vector Distance
Use _vector_distance in sort_by to sort results by their vector distance from the query:
sort_by=_vector_distance:asc
This is useful as a tiebreaker when combining vector search with other sort criteria. Requires vector_query to be set.
Filtering with Vector Search
Combine filter_by with vector_query to pre-filter before vector search:
GET /collections/articles/documents/search?\
q=*&query_by=title&\
filter_by=category:=Technology&\
vector_query=embedding:([0.12, -0.34, ...], k:10)
X-TYPESENSE-API-KEY: YOUR_API_KEY
Filters are applied using roaring bitmaps before the vector search runs, so only matching documents are considered as candidates.
HNSW Index
Torque uses an HNSW (Hierarchical Navigable Small World) graph for approximate nearest neighbor search. HNSW provides sub-linear search time with high recall.
| Parameter | Value | Description |
|---|---|---|
| M | 16 | Maximum connections per node per layer |
| M0 | 32 | Maximum connections for layer 0 (2×M) |
| efconstruction | 200 | Search beam width during index building (higher = better recall, slower build) |
The HNSW graph is built during index construction and supports incremental updates in realtime mode. The graph is persisted to disk along with the rest of the index.
RaBitQ Quantization
For vector fields with 64 or more dimensions, Torque automatically applies RaBitQ binary quantization. RaBitQ compresses each vector into a compact binary representation using random orthogonal rotation followed by sign-bit encoding.
| Without Quantization | With RaBitQ |
|---|---|
| 4 bytes per dimension (float32) | 1 bit per dimension + compact metadata |
| 3072 bytes for 768-dim vector | ~100 bytes for 768-dim vector |
RaBitQ maintains high search accuracy while significantly reducing memory usage, especially important for large-scale vector collections. Based on research published at SIGMOD 2024.
Note: RaBitQ is applied automatically — no configuration needed. Vectors with fewer than 64 dimensions use full float32 precision.
GPU Acceleration
When a CUDA-capable NVIDIA GPU is available, Torque accelerates vector distance calculations using cuBLAS. This includes batch matrix operations for scanning quantized vectors and re-ranking with full float32 precision.
GPU acceleration is automatic. Torque detects the GPU at startup and offloads vector operations transparently. No configuration change is needed — the same API and same results, just faster.
Tip: Unlike some search engines that only use GPU for generating embeddings, Torque uses GPU for the search itself — scoring, distance computation, and candidate ranking all run on GPU when available.
Memory Management
Vector indexes are stored in contiguous memory for cache-efficient access. Use --max-vector-memory-gb (default: 16 GB) to limit total vector memory across all collections:
torque-server --api-key KEY --max-vector-memory-gb 32
If vector memory exceeds the limit, new vector fields will be rejected. Existing indexes are not evicted.
Best Practices
- Use the same embedding model for indexing and querying. Mixing models (e.g., indexing with OpenAI, querying with Cohere) will produce meaningless results.
- Match the distance metric to your model. Most text embedding models (OpenAI, Sentence Transformers, E5) produce normalized vectors — use
cosine. - Set k higher than per_page for hybrid search to give RRF enough candidates from both text and vector results.
- Pre-filter aggressively. The fewer candidates that enter vector search, the faster it runs.