RAG-as-a-Service · MCP-native · Open source

Hosted RAG your AI agent can actually use.

Hybrid retrieval, audit-first, MCP-native. Run it on Cloudflare or self-host on your own infra — same packages, same contracts.

Try the sandbox View on GitHub

No credit card. Free during beta.

What you get

MCP-native by default

Your AI agent calls Textral directly — no glue code, no wrappers. One install, multi-profile addressing for stage and prod, every tool surfaced.

Hybrid retrieval, in the box

Dense + BM25 + reranker out of the box. Per-namespace dimension locking catches the silent embedding mismatches other services let you ship into production.

Audit-first

Every query is a forensically-replayable event with retrieval lineage, citation integrity, dropped-citation tracking, and per-arm error surfacing.

Run it your way

Hosted on Cloudflare, or self-hosted on your own infra. Same packages. Same contracts. No fork, no migration story.

The audit story

Receipts. For every query.

When retrieval quality silently degrades, most teams find out from a user complaint. Textral surfaces it the second it happens.

Every query returns a query_event_id you can replay weeks later. Full retrieval lineage: which arm fired, what scored, what was reranked, what got cited, what got dropped. Per-arm error messages when an arm degrades. Citation integrity validated against the chunks you actually retrieved.

We shipped a real production fix on day two of building Textral because the audit caught a silent dimension-mismatch bug nothing else would have. The kind of bug that turns into "our retrieval just got worse, no one knows why" three months later.

response.json audit.retrieval_status: full

{
  "query_event_id": "qev_01KR39M0YHK6KJ14S7QXKVWWQW",
  "answer": "The library's legacy endures through the texts that were copied and disseminated…",
  "citations": [
    { "n": 1, "chunk_id": "chk_…", "section_path": "/chapter-six-what-survived" },
    { "n": 2, "chunk_id": "chk_…", "section_path": "/chapter-three-the-founding" },
    { "n": 3, "chunk_id": "chk_…", "section_path": "/chapter-four-the-scholars" },
    { "n": 4, "chunk_id": "chk_…", "section_path": "/chapter-five-the-decline" }
  ],
  "audit": {
    "retrieval_status": "full",
    "dense_candidates_returned": 4,
    "sparse_candidates_returned": 8,
    "candidates_returned": 8,
    "reranker": { "provider": "voyage", "model": "rerank-2", "executed": true },
    "citation_integrity": "valid",
    "dropped_citations": []
  }
}

retrieval_lineage / fully replayable

Code-first proof

Three lines from "I just heard about you" to "my agent is querying my docs."

One command. Your agent gets retrieval as a tool.

claude mcp add textral --scope user -- npx -y @textral/mcp

Then ask: "What survived the Library of Alexandria?" — your agent calls query, gets cited chunks, and answers.

Type-safe client paired with @textral/contracts.

import { Client } from "@textral/sdk";

const client = new Client({ profile: "hosted-prod" });

// Ingest
await client.documents.ingest({
  namespace: "narrative",
  filename: "alexandria.md",
  content: alexandriaText,
});

// Query with citations
const { answer, citations, audit } = await client.query({
  namespace: "narrative",
  query: "What survived the Library of Alexandria?",
});

audit.query_event_id is your replay handle. Save it next to your logs.

No SDK required. Plain HTTP, OpenAPI-spec’d.

curl -X POST https://api.textral.alacrity.ai/query \
  -H "Authorization: Bearer $TEXTRAL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "namespace": "narrative",
    "query": "What survived the Library of Alexandria?",
    "embedding": { "provider": "openai", "model": "text-embedding-3-large" },
    "inference":  { "provider": "openai", "model": "gpt-4o-mini" }
  }'

Full OpenAPI surface at https://api.textral.alacrity.ai/docs.

How it works

What's actually happening

Ingest splits documents into section-aware chunks, embeds them with the provider you chose, indexes them in both a vector store (Cloudflare Vectorize or Pinecone) and a BM25 sparse index. Query runs both arms in parallel, fuses with reciprocal-rank fusion, reranks with Voyage, and synthesizes with the inference model you chose. Every step writes an audit row. Every audit row is replayable.

Ingest

section-aware chunks

Embed

OpenAI / Voyage / Cohere

Index

Vectorize or Pinecone + FTS5

Retrieve

dense + sparse, in parallel

Fuse · Rerank · Synthesize

RRF → Voyage → inference

Architecture diagram with the full ingest + query paths is on the roadmap. In the meantime: every step above appears in the audit object.

Deploy anywhere

Hosted, self-hosted, or both at once.

The MCP profile model lets one Claude Code session talk to a hosted dev instance and a self-hosted prod instance in the same conversation. We did not bolt this on as a v2 — it's how the product was built.

Hosted

Hosted on Cloudflare

Spin up a tenant in the sandbox, get an API key, ingest your first document. We run the workers, the database, the vector store. You bring your provider keys (OpenAI, Anthropic, Voyage, Cohere) — or use ours.

Start in the sandbox

Self-hosted

Self-hosted on your infra

Same packages. Same contracts. Deploy @textral/api to your own Cloudflare account or your own runtime. The MCP server, SDKs, and audit trail work identically. No fork, no separate codebase to maintain.

Self-hosting guide

How it stacks up

Honestly compared

If a competitor matches us on a row, we say so. If we're behind, we don't hide it. Reviewed 2026-05-08.

Feature	Textral	Pinecone Assistant	Vectara	OpenAI File Search
MCP-native		—	—	—
Hybrid retrieval (dense + sparse)				partial
Reranker in the box				—
Full audit lineage	full	partial	partial	minimal
Self-hostable (same packages)		—	—	—
Multi-tenant API		per account	per account	per account
Pluggable vector backend		locked to Pinecone	locked	locked to OpenAI
BYO provider keys		partial	partial	—
Open-source SDK + MCP		—	—	—

"MCP-native" means we ship a maintained MCP server as a first-class npm package, not that the competitor cannot be wrapped. Wrapping any REST API in an MCP server is possible; doing it well is not.

Sources today: files (PDF, Markdown, plain). Connectors (GitHub, Notion, Slack, Drive, Confluence, S3, SharePoint, Postgres) on the roadmap.

What's next

What we're building next

Public roadmap. Pull-requests welcome.

Roadmap

Eval as a Service

Public benchmark + per-namespace drift detection. nDCG@k, Recall@k, MRR — alerted when retrieval quality degrades on your corpus.

Read the spec

Roadmap

Agentic Retrieval

Multi-hop, decompose-then-synthesize. The audit captures the full hop trace, so complex questions stay debuggable.

Read the spec

Roadmap

Connector Marketplace

Notion, GitHub, Slack, Drive, Confluence. Auto-ingest, delta-sync, source metadata for filtered queries.

Read the spec

Roadmap

Domain Tuning

Hosted fine-tuned rerankers on your corpus. +5–12pt nDCG over generic baselines for specialized domains.

Read the spec

Full roadmap (13 docs)

Beta · No credit card

Free during beta.

We're in private beta. Sandbox tenants are free; bring-your-own provider keys (OpenAI, Anthropic, Voyage, Cohere) cover inference and reranking costs.

When pricing lands, you'll see it here first — and beta tenants get advance notice and a grandfathered tier.

Start in the sandbox

Put your docs to work.

Spin up a tenant. Ingest your first document. Watch your agent cite it. All in under three minutes.

Try the sandbox View on GitHub

Free during beta. No credit card.