Product

The RAG stack, already assembled.

Dewey replaces the document parser, vector database, and embedding pipeline you'd otherwise assemble and maintain yourself, and ships features no other managed service offers.

How Dewey stacks up

Feature	OpenAI File Search	Vectara	DIY (LlamaIndex)
Document parsing (PDF, DOCX, PPTX, HTML, Markdown)
Chunking, embedding & hybrid search OpenAI File Search uses vector search only; BM25 keyword component requires additional setup outside the managed service.	partial
Semantic reranking (cross-encoder, on by default) Dewey re-scores RRF candidates with a cross-encoder model before returning results, improving precision without requiring query tuning. Runs in-process with no external API call.			partial
Multi-step research (quick → exhaustive)	partial		partial
Section-aware structure + lightweight section scan Documents are parsed into their natural heading hierarchy. A dedicated /sections/scan endpoint lets agents scan section summaries cheaply before deciding which chunks to retrieve. No other managed RAG service offers this.
AI captioning (figures, diagrams & tables become searchable) Dewey uses a vision model to caption images and an LLM to summarize tables, then indexes both as searchable chunks. Tabular data and embedded figures are fully retrievable by semantic search.
MCP server (Claude, Cursor, agents)
Hosted agents (saved configurations + full run trace) Save a system prompt, tool set, collection scope, and model as a named agent. Invoke from the dashboard, REST API, CLI, or SDKs. Every run captures the full tool-call trace with inline citations and clickable source rows.			partial
Official CLI (one-line install, JSON-pipeable) Single Go binary — no Python or Node runtime required. Upload, search, stream cited research, and invoke hosted agents from any shell. Stable --json contract on every command for piping into other tools.	partial	partial
Corpus quality analysis (deduplication + contradiction detection) Dewey clusters near-duplicate documents by content overlap and picks a canonical per group, then analyzes extracted claims across the corpus to surface conflicting statements by severity with suggested resolution instructions. No other managed RAG service offers either.

Comparison based on publicly available documentation as of early 2026. Features and pricing change; verify with each vendor.

87.3%

vs. 19% for standard vector RAG

FinanceBench accuracy at exhaustive depth

We ran Dewey's /research endpoint on all 150 questions in FinanceBench, a benchmark of financial Q&A drawn from real SEC filings. At exhaustive depth with Claude Opus 4.6, Dewey achieves 87.3% accuracy, surpassing the full-document-in-context baseline and well above the 19% typical of standard vector RAG. Requires Pro and a BYOK key.

Read the full study →

Why Dewey

Six things you won't find anywhere else

🗂️

Sections, not just chunks

Most RAG systems split documents into fixed-size token windows and call them chunks. Dewey parses the document's actual heading hierarchy (title, section, subsection) and indexes each section as a first-class entity. Sections with generic titles like "Introduction" or "Chapter 3" automatically get AI-generated summaries so they're just as findable as sections with descriptive headings. Search results and cited answers reference the exact section by name, not an anonymous text fragment at offset 4,096.

🔑

Your AI bill, not ours

OpenAI File Search and Vectara bundle generation costs into their pricing: you pay their margin on every query. Dewey is different. Bring your own OpenAI, Anthropic, or Google Gemini key and pay the provider directly, at cost. No markup, no proprietary model requirement, and credit metering is bypassed entirely for BYOK requests. You get full visibility into what you're spending and why.

🔔

Event-driven from the start

Dewey treats document processing as a first-class event stream. Webhooks fire when a document becomes ready or fails, so your downstream systems, agents, and workflows react instantly without polling. Pair that with real-time SSE events in the dashboard and you always know exactly where every document is in the pipeline.

📡

Real-time, not polling

Document processing is async, and Dewey makes it feel instant. Server-sent events push ingestion status directly to your client as each file moves through the pipeline. A document doesn't have to be fully processed to be useful: the section manifest is queryable the moment sectioning completes, before a single embedding is written. No polling loops, no waiting for the whole pipeline to finish.

🔍

Know when your corpus contradicts itself

Claim extraction atomizes every document into discrete, importance-scored facts. Contradiction detection then clusters conflicting claims across your entire corpus, rates them by severity, and generates a suggested resolution instruction you can apply in one click. No other managed RAG service does this — it turns your document library from a search index into a quality-aware knowledge base.

♊

One canonical copy, even when you've uploaded three

Research corpora accumulate duplicates — the same PDF from a preprint server, a journal, and a mirrored archive. Dewey clusters near-duplicates by measuring how much content they share, picks a canonical copy per cluster, and silently excludes the rest from retrieval and contradiction detection. Research answers stop citing the same content under three filenames, and you can promote a different canonical or disband the group at any time.

Ready to stop maintaining a pipeline?

Free tier, no credit card required. First search result in under a minute.

See pricing Read the docs