Skip to content
RAG-as-a-Service · MCP-native · Open source

Hosted RAG your AI agent can actually use.

Hybrid retrieval, audit-first, MCP-native. Run it on Cloudflare or self-host on your own infra — same packages, same contracts.

No credit card. Free during beta.

A Claude Code terminal showing an MCP install for Textral, a query about the Library of Alexandria, hybrid retrieval results, and cited source documents.

What you get

MCP-native by default

Your AI agent calls Textral directly — no glue code, no wrappers. One install, multi-profile addressing for stage and prod, every tool surfaced.

Hybrid retrieval, in the box

Dense + BM25 + reranker out of the box. Per-namespace dimension locking catches the silent embedding mismatches other services let you ship into production.

Audit-first

Every query is a forensically-replayable event with retrieval lineage, citation integrity, dropped-citation tracking, and per-arm error surfacing.

Run it your way

Hosted on Cloudflare, or self-hosted on your own infra. Same packages. Same contracts. No fork, no migration story.

The audit story

Receipts. For every query.

When retrieval quality silently degrades, most teams find out from a user complaint. Textral surfaces it the second it happens.

Every query returns a query_event_id you can replay weeks later. Full retrieval lineage: which arm fired, what scored, what was reranked, what got cited, what got dropped. Per-arm error messages when an arm degrades. Citation integrity validated against the chunks you actually retrieved.

We shipped a real production fix on day two of building Textral because the audit caught a silent dimension-mismatch bug nothing else would have. The kind of bug that turns into "our retrieval just got worse, no one knows why" three months later.

response.json audit.retrieval_status: full
{
  "query_event_id": "qev_01KR39M0YHK6KJ14S7QXKVWWQW",
  "answer": "The library's legacy endures through the texts that were copied and disseminated…",
  "citations": [
    { "n": 1, "chunk_id": "chk_…", "section_path": "/chapter-six-what-survived" },
    { "n": 2, "chunk_id": "chk_…", "section_path": "/chapter-three-the-founding" },
    { "n": 3, "chunk_id": "chk_…", "section_path": "/chapter-four-the-scholars" },
    { "n": 4, "chunk_id": "chk_…", "section_path": "/chapter-five-the-decline" }
  ],
  "audit": {
    "retrieval_status": "full",
    "dense_candidates_returned": 4,
    "sparse_candidates_returned": 8,
    "candidates_returned": 8,
    "reranker": { "provider": "voyage", "model": "rerank-2", "executed": true },
    "citation_integrity": "valid",
    "dropped_citations": []
  }
}
Diagram showing a query flowing through parallel dense and sparse retrieval, joining at RRF fusion, then reranker, then synthesis, with an audit trail recording every step.
retrieval_lineage / fully replayable

Code-first proof

Three lines from "I just heard about you" to "my agent is querying my docs."

One command. Your agent gets retrieval as a tool.
claude mcp add textral --scope user -- npx -y @textral/mcp
Then ask: "What survived the Library of Alexandria?" — your agent calls query, gets cited chunks, and answers.

How it works

What's actually happening

Ingest splits documents into section-aware chunks, embeds them with the provider you chose, indexes them in both a vector store (Cloudflare Vectorize or Pinecone) and a BM25 sparse index. Query runs both arms in parallel, fuses with reciprocal-rank fusion, reranks with Voyage, and synthesizes with the inference model you chose. Every step writes an audit row. Every audit row is replayable.

Ingest

section-aware chunks

Embed

OpenAI / Voyage / Cohere

Index

Vectorize or Pinecone + FTS5

Retrieve

dense + sparse, in parallel

Fuse · Rerank · Synthesize

RRF → Voyage → inference

Architecture diagram with the full ingest + query paths is on the roadmap. In the meantime: every step above appears in the audit object.

Deploy anywhere

Hosted, self-hosted, or both at once.

The MCP profile model lets one Claude Code session talk to a hosted dev instance and a self-hosted prod instance in the same conversation. We did not bolt this on as a v2 — it's how the product was built.

A mirrored architecture diagram with hosted Cloudflare deployment on the left and self-hosted infrastructure on the right, both connected through a shared @textral/contracts and @textral/sdk and @textral/mcp layer that produces identical outcomes.

Hosted

Hosted on Cloudflare

Spin up a tenant in the sandbox, get an API key, ingest your first document. We run the workers, the database, the vector store. You bring your provider keys (OpenAI, Anthropic, Voyage, Cohere) — or use ours.

Start in the sandbox

Self-hosted

Self-hosted on your infra

Same packages. Same contracts. Deploy @textral/api to your own Cloudflare account or your own runtime. The MCP server, SDKs, and audit trail work identically. No fork, no separate codebase to maintain.

Self-hosting guide

How it stacks up

Honestly compared

If a competitor matches us on a row, we say so. If we're behind, we don't hide it. Reviewed 2026-05-08.

Feature Textral Pinecone Assistant Vectara OpenAI File Search
MCP-native
Hybrid retrieval (dense + sparse) partial
Reranker in the box
Full audit lineage full partial partial minimal
Self-hostable (same packages)
Multi-tenant API per account per account per account
Pluggable vector backend locked to Pinecone locked locked to OpenAI
BYO provider keys partial partial
Open-source SDK + MCP

"MCP-native" means we ship a maintained MCP server as a first-class npm package, not that the competitor cannot be wrapped. Wrapping any REST API in an MCP server is possible; doing it well is not.

A semantic-fabric diagram with the Textral core node connected to GitHub, Notion, Slack, Google Drive, Confluence, S3, Markdown, PDFs, APIs, PostgreSQL, SharePoint, and the local filesystem.
Sources today: files (PDF, Markdown, plain). Connectors (GitHub, Notion, Slack, Drive, Confluence, S3, SharePoint, Postgres) on the roadmap.

Beta · No credit card

Free during beta.

We're in private beta. Sandbox tenants are free; bring-your-own provider keys (OpenAI, Anthropic, Voyage, Cohere) cover inference and reranking costs.

When pricing lands, you'll see it here first — and beta tenants get advance notice and a grandfathered tier.

Put your docs to work.

Spin up a tenant. Ingest your first document. Watch your agent cite it. All in under three minutes.

Free during beta. No credit card.