Vector Search

Torque supports approximate nearest neighbor (ANN) vector search using HNSW graph indexing with RaBitQ binary quantization. Combine vector search with full-text search for hybrid retrieval via reciprocal rank fusion.

Schema Setup

Add a vector field to your collection using the float[] type with num_dim set to the number of embedding dimensions:

{
  "name": "articles",
  "fields": [
    {"name": "title", "type": "string"},
    {"name": "body", "type": "string"},
    {"name": "embedding", "type": "float[]", "num_dim": 768}
  ]
}

Tip: Common embedding dimensions are 384 (MiniLM), 768 (BERT/E5), 1024 (Cohere), 1536 (OpenAI text-embedding-3-small), and 3072 (text-embedding-3-large). Torque handles all sizes.

Distance Metrics

Set the distance metric with the vec_dist field option at schema creation time:

ValueMetricBest For
cosine (default)Cosine similarityNormalized embeddings (OpenAI, Cohere, most models). Measures angle between vectors.
ipInner product (dot product)Models where vector magnitude carries meaning. Higher value = more similar.
l2Euclidean distance (L2)Spatial data, image features. Measures straight-line distance. Lower = more similar.
{"name": "embedding", "type": "float[]", "num_dim": 768, "vec_dist": "cosine"}

Torque automatically normalizes vectors for cosine distance at index time.

Indexing Documents

Include the embedding vector when importing documents. The vector must have exactly num_dim elements:

{"id": "1", "title": "Machine Learning Basics", "body": "...", "embedding": [0.12, -0.34, 0.56, ...]}
{"id": "2", "title": "Deep Learning Guide", "body": "...", "embedding": [0.23, 0.45, -0.67, ...]}

Note: Torque does not generate embeddings. Your application must compute embeddings before indexing, using your preferred model (OpenAI, Sentence Transformers, Cohere, etc.). This gives you full control over the embedding pipeline.

Vector Search

Use the vector_query parameter to find the nearest neighbors to a query vector:

GET /collections/articles/documents/search?\
q=*&query_by=title&\
vector_query=embedding:([0.12, -0.34, 0.56, ...], k:10)
X-TYPESENSE-API-KEY: YOUR_API_KEY

Parameters

All parameters are specified inside the vector_query string:

ParameterDefaultDescription
k100Number of nearest neighbors to return. Range: 1–1000.
distance_thresholdMaximum distance cutoff. Results beyond this threshold are excluded. For cosine/dot product, higher = more similar so results below the threshold are excluded. For L2, lower = more similar so results above the threshold are excluded.
efmax(k, 100)HNSW search beam width. Higher values improve recall at the cost of speed.
flat_search_cutoffWhen the pre-filtered candidate set is smaller than this value, bypass HNSW and use brute-force distance computation for exact results.
idDocument ID for “find similar” queries. The document’s vector is looked up from the index and used as the query vector. Replaces the vector array.
alphaHybrid search weight (0.0–1.0). When set, uses score interpolation instead of RRF: alpha × vector + (1−alpha) × text. Only applies when combined with a text query.
# Basic vector search
vector_query=embedding:([0.12, -0.34, ...], k:10)

# With distance threshold and increased recall
vector_query=embedding:([0.12, -0.34, ...], k:20, distance_threshold:0.5, ef:200)

# Brute-force for small filtered sets
vector_query=embedding:([0.12, -0.34, ...], k:10, flat_search_cutoff:500)

Set q=* for pure vector search (no text matching). The response includes matching documents ranked by vector similarity, with each hit containing a vector_distance field.

Find Similar Documents

Use id: instead of a vector array to find documents similar to an existing document. Torque looks up the document’s vector from the index:

vector_query=embedding:(id:doc_123, k:10)

This is useful for “more like this” recommendations without needing to compute or send the query vector from your application.

Hybrid Search

Combine full-text BM25F search with vector similarity in a single query. By default, Torque uses reciprocal rank fusion (RRF) to merge both result lists:

GET /collections/articles/documents/search?\
q=machine learning&query_by=title,body&\
vector_query=embedding:([0.12, -0.34, 0.56, ...], k:20)
X-TYPESENSE-API-KEY: YOUR_API_KEY

RRF works by assigning each document a score based on its rank in each result list, then summing the scores. Documents that rank well in both text and vector results are promoted. This gives you keyword precision and semantic recall in one query without tuning a weight parameter.

Tip: Set k higher than your desired per_page so RRF has enough candidates from each side. For example, use k:50 with per_page=10.

Alpha Weighting

For finer control over the text vs. vector balance, set the alpha parameter. When alpha is set, Torque uses score interpolation instead of RRF:

# 70% vector weight, 30% text weight
vector_query=embedding:([0.12, -0.34, ...], k:20, alpha:0.7)

# 30% vector, 70% text
vector_query=embedding:([0.12, -0.34, ...], k:20, alpha:0.3)

Scores from both text search and vector search are normalized to [0, 1] before interpolation, so the alpha value directly controls the balance regardless of the raw score scales.

alphaBehavior
0.0Pure text search (vector results ignored)
0.5Equal weight to text and vector
1.0Pure vector search (text results ignored)
Not setDefault: reciprocal rank fusion (RRF)

Sorting by Vector Distance

Use _vector_distance in sort_by to sort results by their vector distance from the query:

sort_by=_vector_distance:asc

This is useful as a tiebreaker when combining vector search with other sort criteria. Requires vector_query to be set.

Filtering with Vector Search

Combine filter_by with vector_query to pre-filter before vector search:

GET /collections/articles/documents/search?\
q=*&query_by=title&\
filter_by=category:=Technology&\
vector_query=embedding:([0.12, -0.34, ...], k:10)
X-TYPESENSE-API-KEY: YOUR_API_KEY

Filters are applied using roaring bitmaps before the vector search runs, so only matching documents are considered as candidates.

HNSW Index

Torque uses an HNSW (Hierarchical Navigable Small World) graph for approximate nearest neighbor search. HNSW provides sub-linear search time with high recall.

ParameterValueDescription
M16Maximum connections per node per layer
M032Maximum connections for layer 0 (2×M)
efconstruction200Search beam width during index building (higher = better recall, slower build)

The HNSW graph is built during index construction and supports incremental updates in realtime mode. The graph is persisted to disk along with the rest of the index.

RaBitQ Quantization

For vector fields with 64 or more dimensions, Torque automatically applies RaBitQ binary quantization. RaBitQ compresses each vector into a compact binary representation using random orthogonal rotation followed by sign-bit encoding.

Without QuantizationWith RaBitQ
4 bytes per dimension (float32)1 bit per dimension + compact metadata
3072 bytes for 768-dim vector~100 bytes for 768-dim vector

RaBitQ maintains high search accuracy while significantly reducing memory usage, especially important for large-scale vector collections. Based on research published at SIGMOD 2024.

Note: RaBitQ is applied automatically — no configuration needed. Vectors with fewer than 64 dimensions use full float32 precision.

GPU Acceleration

When a CUDA-capable NVIDIA GPU is available, Torque accelerates vector distance calculations using cuBLAS. This includes batch matrix operations for scanning quantized vectors and re-ranking with full float32 precision.

GPU acceleration is automatic. Torque detects the GPU at startup and offloads vector operations transparently. No configuration change is needed — the same API and same results, just faster.

Tip: Unlike some search engines that only use GPU for generating embeddings, Torque uses GPU for the search itself — scoring, distance computation, and candidate ranking all run on GPU when available.

Memory Management

Vector indexes are stored in contiguous memory for cache-efficient access. Use --max-vector-memory-gb (default: 16 GB) to limit total vector memory across all collections:

torque-server --api-key KEY --max-vector-memory-gb 32

If vector memory exceeds the limit, new vector fields will be rejected. Existing indexes are not evicted.

Best Practices

  • Use the same embedding model for indexing and querying. Mixing models (e.g., indexing with OpenAI, querying with Cohere) will produce meaningless results.
  • Match the distance metric to your model. Most text embedding models (OpenAI, Sentence Transformers, E5) produce normalized vectors — use cosine.
  • Set k higher than per_page for hybrid search to give RRF enough candidates from both text and vector results.
  • Pre-filter aggressively. The fewer candidates that enter vector search, the faster it runs.