Document Intelligence: Semantic Search Now Processes 10+ File Formats
AI Labs

Document Intelligence: Semantic Search Now Processes 10+ File Formats

Document Intelligence now indexes and searches across scanned PDFs, handwritten notes, spreadsheets, presentations, and seven other file formats. Find what you need regardless of where or how it was originally created.

WS

Wael Salem

Author

March 15, 2026
5 min read

Technologies Used

Document ProcessingSemantic SearchOCRMulti-Format

Document Intelligence: Search Everything, Find Anything

Every organization we work with has the same problem: critical information is trapped in documents that their search tools cannot read. The signed contract is a scanned PDF. The meeting notes are handwritten. The financial model is a spreadsheet. The board presentation is a slide deck. None of these talk to each other. Finding a specific piece of information means opening files one by one and hoping you remember where it lives.

Document Intelligence now processes and semantically indexes over ten file formats. You ask a question in plain language, and it finds the answer -- whether it lives in a scanned contract, a handwritten note, an Excel model, or a PowerPoint deck. This is live and in production.

What We Shipped

The system handles native and scanned PDFs, handwritten documents, Word files (including tracked changes), Excel spreadsheets, PowerPoint presentations, Google Workspace files, images of documents and whiteboards, email files with attachments, and CSV data.

The core capability is unified cross-format search. A single query like "renewal terms for the Acme contract" searches across the scanned PDF of the signed agreement, the Word document with the latest draft, the email thread discussing amendments, and the spreadsheet tracking contract milestones. Results are ranked by relevance regardless of source format.

For spreadsheets specifically, the system understands tabular structure. You can ask a plain-language question about your data and it will parse the relevant spreadsheet, identify the correct columns, and return the answer.

Handwriting recognition supports English, Arabic, Hindi, and Mandarin scripts. It handles both cursive and print handwriting on lined, unlined, and grid paper.

Why This Matters

This is an automation solution for a problem that currently eats hours every week across every department.

For legal and compliance teams, you can now search your entire contract library regardless of how documents were originally created. Finding a specific clause across hundreds of agreements takes seconds instead of hours of manual review.

For finance teams, your financial models and reports are searchable alongside memos, presentations, and correspondence. Audit preparation becomes dramatically faster when all supporting documents are indexed and queryable from one place.

For operations teams dealing with handwritten inspection forms, field notes, or legacy paper records, this is the first time these documents become part of your searchable knowledge base.

For any executive who has spent half an hour looking for a number they know exists somewhere in their files -- that problem is solved. One search, one answer, regardless of file format.

Get Started

To see Document Intelligence search across your actual documents, contact info@salem.ventures. We will run a pilot with a sample of your files so you can evaluate search quality on your own data.

Product UpdateDocument IntelligenceSemantic SearchFile Processing

Share this article

Salem Ventures

👋 Hi there! Have questions about our fintech solutions? We're here to help!

Typically replies instantly