Introducing Torque: A Search Engine That Restores 4.64M Documents in 1.8 Seconds

Today we're releasing Torque 0.5.0 - a high-performance, in-memory search engine built from scratch in Rust. It's a drop-in replacement for Typesense with 6x faster search, 7.5x faster ingestion, and sub-2-second restarts at multi-million document scale.

Why we built Torque

We needed a search engine for Traverse that could handle millions of documents with sub-millisecond latency. Existing options forced a trade-off: Elasticsearch is powerful but operationally heavy (JVM tuning, shard management, segment merges). Typesense is simple but limited at scale - no sharding, no BM25 scoring, and restart times measured in minutes to hours for large datasets. Algolia is fast but closed-source and SaaS-only.

We wanted something different: an engine that serves queries from memory with zero-copy data structures, persists everything to disk for durability, and restores instantly. So we built one.

Architecture

Torque holds all data in memory for serving. Inverted index posting lists and term frequencies are memory-mapped from disk using CRoaring's Frozen bitmap format - no deserialization, no heap allocation. The file on disk IS the serving data structure. Other components (filters, sort indexes, column store) are deserialized from compressed binary on startup.

Every collection's index is held behind an ArcSwap - a lock-free atomic pointer. Search queries load an Arc<IndexBundle> with a single atomic read. Index rebuilds happen in the background and swap the pointer atomically when complete. Queries never block on writes. Writes never block on queries.

Interface	Port	Purpose
HTTP API	8108	Search, CRUD, Studio, admin, TLS
TCP Binary	8109	High-throughput streaming ingest
Studio	8108	Embedded web UI
MCP	8108	Model Context Protocol for AI agents

Scoring: BM25F with SIMD

Torque uses BM25F - the standard information retrieval scoring function with inverse document frequency and document-length normalization. Rare terms are weighted higher. Long documents aren't penalized for containing more words. This matters for long-form content (articles, documentation, legal text) where term rarity is a strong relevance signal.

Typesense and Meilisearch use custom tie-breaking algorithms without IDF. Algolia uses a similar tie-breaking approach. Elasticsearch uses BM25 via Lucene.

Torque's BM25F has three hot-path implementations depending on query shape: SIMD AVX2 (single-field, 8 candidates scored per instruction), 4x unrolled (multi-field), and scalar fallback. On AMD EPYC with AVX-512, bitmap intersections process 512 document IDs per instruction via CRoaring's SIMD-optimized C library.

Persistence: 1.8-second restore

Torque persists everything to disk automatically. On restart, it memory-maps the inverted index files and deserializes the rest in parallel. The 4.64-million-document English Wikipedia index restores from disk in 1.8 seconds.

Engine	Restart strategy	Cost at scale
Torque	mmap frozen bitmaps (zero-copy)	1.8s for 4.64M docs
Typesense	Rebuild ART from RocksDB	78 minutes reported for 28M docs
Elasticsearch	mmap Lucene segments	Seconds (but cold queries hit disk)
Meilisearch	LMDB mmap	Fast

The design principle: the build pipeline produces data structures that are simultaneously the in-memory serving format AND the on-disk persistence format. There is no separate “load” step. Restart means mmap the files and start serving.

Three ingestion paths

Torque is push-based - it never connects to your database. Clients handle data fusion and stream documents to Torque.

TCP Binary Protocol. Custom binary wire format with schema-aware encoding. No JSON parsing overhead. Measured at 69K docs/s on a 309K-document benchmark - 7.5x faster than Typesense's HTTP JSONL ingest.
TQBF Binary File Upload. Pre-encode documents to a compressed binary file, upload in one HTTP POST. A single request imports the entire dataset. Measured at 244K docs/s on 4.64M Wikipedia documents.
HTTP JSONL. Typesense-compatible REST API. Best for small ad-hoc updates and existing Typesense client compatibility.

All five client SDKs implement all three ingestion methods.

GPU acceleration

Torque is the only search engine we're aware of that uses GPU acceleration for the core search operations themselves - not just for embedding generation or vector index building.

Ten CUDA kernels accelerate bitmap intersection, numeric range filtering, BM25F scoring, facet counting, vector distance computation, and geo filtering. When no GPU is available, Torque falls back to CPU transparently with no configuration change.

Elasticsearch and Typesense use GPUs only for ML model inference. Vespa uses GPUs for ONNX model evaluation during ranking. None of them accelerate the inverted index search path itself.

Benchmark: Torque vs Typesense vs Quickwit

Test setup: 309K Swedish bankruptcy records, 67 fields with nested objects, geo coordinates, and financial data. All three engines on the same machine, same dataset, same query patterns. 20 iterations, 5 warmup, CPU-only, HTTP round-trip latency on localhost. Quickwit is backed by Tantivy (the Rust equivalent of Lucene).

Pattern	Torque p50	Typesense p50	Quickwit p50	vs Typesense	vs Quickwit
text_search	2.7 ms	2.8 ms	2.9 ms	1.1x	1.1x
filter_only	0.2 ms	1.7 ms	3.5 ms	7.5x	15x
faceted	1.2 ms	13.1 ms	2.9 ms	11.2x	2.4x
sort_only	0.2 ms	8.0 ms	7.9 ms	53x	52x
group_by	0.5 ms	75.5 ms	1.4 ms	149x	2.8x
geo_sort	0.3 ms	13.0 ms	-	46x	-
prefix_search	0.4 ms	1.1 ms	7.5 ms	2.5x	17x
multi_filter	0.6 ms	18.8 ms	3.2 ms	29x	5x
Geometric mean				6.1x	4.2x

The largest advantages are in operations dominated by bitmap and column-store operations: filtering, sorting, grouping, and geo queries. Text search is similar across all engines because they're all fundamentally bounded by the same inverted index lookups.

Ingestion

Engine	Method	Time (309K docs)	Throughput
Torque	TCP binary	4.5s	69K docs/s
Quickwit	HTTP NDJSON	7.6s	41K docs/s
Typesense	HTTP JSONL	33.8s	9K docs/s

Wikipedia scale (4.64M documents)

Metric	Value
Import (TQBF binary)	19s (244K docs/s)
Index build	~3 minutes (streaming, disk-backed text)
Persist to disk	2.5 GB
Restore from disk	1.8s
Serving RSS	3.1 GB (inverted index mmap'd, not on heap)
Search “barack obama”	8,181 results in 3 ms

Typesense v30.1 API compatible

Torque implements the Typesense v30.1 REST API for all search and CRUD operations. If you have an existing Typesense deployment, you can point your clients at Torque and they work without code changes.

Fully compatible: Collections, Documents, Search, Multi-search, Aliases, API Keys, Synonyms, Presets, Stopwords.

Torque-only extensions: TCP binary ingest, TQBF binary file import, collection suspend/resume, compaction, GPU acceleration, decay sorting, nested element-level AND filters, MCP server.

Client SDKs

Five official SDKs, each with the full HTTP API, the TCP binary ingest protocol, and the TQBF binary file writer:

Language	Package	Runtime
.NET 10+	`Torque.Http`	Zero NuGet deps, source-generated JSON
Python 3.10+	`torque-http`	Zero deps, stdlib only
Node.js 20+	`@truespar/torque-http`	Zero deps, native fetch, TypeScript
Go 1.23+	`torque-sdk-go`	Zero deps, stdlib only
Java 21+	`com.truespar:torque-http`	Jackson 3.1, typed models

Torque Studio

Torque ships with an embedded web UI - no separate install, no npm, no build step. Open http://localhost:8108 and you have a search editor with syntax highlighting, a schema browser, a data import panel, a benchmark dashboard, API key management, and full documentation.

MCP server

Torque includes a built-in MCP (Model Context Protocol) server at /api/mcp. AI assistants like Claude can search your collections, inspect schemas, and manage documents directly. Nine tools are available: search, multi-search, list collections, get schema, get document, create collection, upsert document, delete document, and delete collection.

// claude_desktop_config.json
{
  "mcpServers": {
    "torque": {
      "url": "http://localhost:8108/api/mcp",
      "headers": {
        "X-TYPESENSE-API-KEY": "your-api-key"
      }
    }
  }
}

What's in the box

Search. Full-text with BM25F, multi-field, prefix, typo tolerance (SymSpell), synonyms, presets, stopwords, highlighting.
Filtering. Exact match, numeric range, boolean, geo radius, nested object conditions, array element-level AND.
Sorting. Field, text relevance, geo distance, decay functions (Gaussian, linear, exponential).
Faceting. Per-field counts with min/max/sum/average stats.
Grouping. Group results by field with configurable per-group limits.
Vector search. HNSW with RaBitQ quantization (1-bit, no training). Hybrid text+vector via RRF or alpha-weighted interpolation.
Stemming. 17 languages: English, German, French, Spanish, Portuguese, Italian, Dutch, Swedish, Norwegian, Danish, Finnish, Russian, Arabic, Greek, Hungarian, Romanian, Turkish.
Auth. Bootstrap, admin, search-only, and HMAC-scoped API keys.
Persistence. Automatic, atomic, zero-config. Collections, indexes, overlays, schemas, keys, synonyms, presets, stopwords - all restored on restart.
Realtime mode. Instant upsert/delete with overlay persistence, auto-compaction, backpressure.
Operations. TLS/HTTPS, Prometheus metrics, IP allowlist, slow query logging, TOML config, env vars, Windows Service, Docker.

Getting started

# Download and run
torque-server --api-key YOUR_KEY --license-key YOUR_LICENSE

# Or Docker
docker run -p 8108:8108 -p 8109:8109 -v torque-data:/data truespar/torque

Open http://localhost:8108 for Studio. Read the docs for the full setup guide, or reach out if you want a demo.

What's next

We're working on Raft-based clustering for multi-node replication, JOINs across collections, conversational search, and continuing to push on raw query performance. Torque already runs as a unikernel on Silicon - the same 84 ms boot time, the same hardware isolation, applied to search.