Skip to main content

Upload documents

Get files into a collection one at a time or in bulk, and follow each document through ingestion.

Upload a single file#

Push one PDF, Word doc, HTML page, or plain text and get a Document back.

Uploads return immediately. Conversion, chunking, and embedding happen in the background — the Document starts in pending status and becomes queryable when it reaches ready. Pass tags or metadata at upload time to make later filtering and bulk updates cheaper.

dewey upload ./report.pdf -c my-docs

# Uploading 1 file to my-docs
#   [ok] report.pdf  uploaded  doc_a1b2c3

Response

{
  "id": "doc_a1b2c3",
  "collectionId": "col_xyz",
  "filename": "report.pdf",
  "status": "pending",
  "fileSizeBytes": 482133,
  "tags": ["q3-2026", "finance"],
  "metadata": {},
  "createdAt": "2026-05-14T18:21:09.412Z"
}

See also

Upload many files#

Upload a folder or list of files concurrently with progress reporting.

The CLI accepts glob patterns directly; the SDKs take an array of files with a progress callback. Concurrency defaults to 4 (CLI) and 5 (SDKs); raise it on fast links, lower it if you start hitting 429s.

dewey upload ./packets/*.pdf -c my-docs --concurrency 8 --watch

# Uploading 12 files to my-docs
#   [ok] q1.pdf  doc_aa
#   [ok] q2.pdf  doc_ab
#   ...
# [ok] 12/12 ready in 1m24s.

See also

Wait for processing to finish#

Block until a document reaches the ready or error terminal state.

For one-off scripts, the SDK exposes `waitForReady` (long-poll, ~5.5 minute timeout). For CI or batch flows, the CLI `--wait` flag is silent; `--watch` streams live events to the terminal. For real-time UI, subscribe to the SSE stream — see the next recipe.

# Silent: block until every uploaded doc is terminal
dewey upload ./report.pdf -c my-docs --wait

# Streaming: print each status transition
dewey upload ./report.pdf -c my-docs --watch

# Wait on a doc that's already been uploaded
dewey docs wait doc_a1b2c3

See also

Subscribe to live processing events#

Stream a collection's documents/events feed to drive a real-time UI.

Server-Sent Events stream every status transition for every document in a collection. Useful for upload progress bars and processing dashboards. EventSource reconnects automatically; on reconnect you'll miss any events emitted during the gap, so re-fetch the current document list if a gap matters.

# Tail every status event for the collection
dewey watch my-docs

# Or filter to one document
dewey watch my-docs --doc doc_a1b2c3

Response

{
  "type": "status",
  "documentId": "doc_a1b2c3",
  "status": "sectioned",
  "filename": "report.pdf",
  "timestamp": "2026-05-14T18:22:11.003Z"
}

See also