Product
The RAG stack, already assembled.
Dewey replaces the document parser, vector database, and embedding pipeline you'd otherwise assemble and maintain yourself, and ships features no other managed service offers.
How Dewey stacks up
| Feature | Dewey | OpenAI File Search | Vectara | DIY (LlamaIndex) |
|---|---|---|---|---|
Document parsing (PDF, DOCX, PPTX, HTML, Markdown) | ||||
Chunking, embedding & hybrid search OpenAI File Search uses vector search only; BM25 keyword component requires additional setup outside the managed service. | partial | |||
Semantic reranking (cross-encoder, on by default) Dewey re-scores RRF candidates with a cross-encoder model before returning results, improving precision without requiring query tuning. Runs in-process with no external API call. | partial | |||
Multi-step research (quick → exhaustive) | partial | partial | ||
Section-aware structure + lightweight section scan Documents are parsed into their natural heading hierarchy. A dedicated /sections/scan endpoint lets agents scan section summaries cheaply before deciding which chunks to retrieve. No other managed RAG service offers this. | ||||
AI captioning (figures, diagrams & tables become searchable) Dewey uses a vision model to caption images and an LLM to summarize tables, then indexes both as searchable chunks. Tabular data and embedded figures are fully retrievable by semantic search. | ||||
MCP server (Claude, Cursor, agents) | ||||
Hosted agents (saved configurations + full run trace) Save a system prompt, tool set, collection scope, and model as a named agent. Invoke from the dashboard, REST API, CLI, or SDKs. Every run captures the full tool-call trace with inline citations and clickable source rows. | partial | |||
Official CLI (one-line install, JSON-pipeable) Single Go binary — no Python or Node runtime required. Upload, search, stream cited research, and invoke hosted agents from any shell. Stable --json contract on every command for piping into other tools. | partial | partial | ||
Corpus quality analysis (deduplication + contradiction detection) Dewey clusters near-duplicate documents by content overlap and picks a canonical per group, then analyzes extracted claims across the corpus to surface conflicting statements by severity with suggested resolution instructions. No other managed RAG service offers either. |
Comparison based on publicly available documentation as of early 2026. Features and pricing change; verify with each vendor.
FinanceBench accuracy at exhaustive depth
We ran Dewey's /research endpoint on all 150 questions in FinanceBench, a benchmark of financial Q&A drawn from real SEC filings. At exhaustive depth with Claude Opus 4.6, Dewey achieves 87.3% accuracy, surpassing the full-document-in-context baseline and well above the 19% typical of standard vector RAG. Requires Pro and a BYOK key.
Why Dewey
Six things you won't find anywhere else
Sections, not just chunks
Most RAG systems split documents into fixed-size token windows and call them chunks. Dewey parses the document's actual heading hierarchy (title, section, subsection) and indexes each section as a first-class entity. Sections with generic titles like "Introduction" or "Chapter 3" automatically get AI-generated summaries so they're just as findable as sections with descriptive headings. Search results and cited answers reference the exact section by name, not an anonymous text fragment at offset 4,096.
Your AI bill, not ours
OpenAI File Search and Vectara bundle generation costs into their pricing: you pay their margin on every query. Dewey is different. Bring your own OpenAI, Anthropic, or Google Gemini key and pay the provider directly, at cost. No markup, no proprietary model requirement, and credit metering is bypassed entirely for BYOK requests. You get full visibility into what you're spending and why.
Event-driven from the start
Dewey treats document processing as a first-class event stream. Webhooks fire when a document becomes ready or fails, so your downstream systems, agents, and workflows react instantly without polling. Pair that with real-time SSE events in the dashboard and you always know exactly where every document is in the pipeline.
Real-time, not polling
Document processing is async, and Dewey makes it feel instant. Server-sent events push ingestion status directly to your client as each file moves through the pipeline. A document doesn't have to be fully processed to be useful: the section manifest is queryable the moment sectioning completes, before a single embedding is written. No polling loops, no waiting for the whole pipeline to finish.
Know when your corpus contradicts itself
Claim extraction atomizes every document into discrete, importance-scored facts. Contradiction detection then clusters conflicting claims across your entire corpus, rates them by severity, and generates a suggested resolution instruction you can apply in one click. No other managed RAG service does this — it turns your document library from a search index into a quality-aware knowledge base.
One canonical copy, even when you've uploaded three
Research corpora accumulate duplicates — the same PDF from a preprint server, a journal, and a mirrored archive. Dewey clusters near-duplicates by measuring how much content they share, picks a canonical copy per cluster, and silently excludes the rest from retrieval and contradiction detection. Research answers stop citing the same content under three filenames, and you can promote a different canonical or disband the group at any time.
Ready to stop maintaining a pipeline?
Free tier, no credit card required. First search result in under a minute.