May 13, 2026
We ran Dewey's /research endpoint on FinanceBench, a 150-question benchmark of financial analysis questions drawn from SEC filings. Claude Opus 4.6 achieves 87.3% accuracy, surpassing the full-document-in-context baseline from the original paper. We also ran ablation studies on document enrichment features and found some counterintuitive results.