Case studySoftware FactoryAurora Insurance

Pinecone-Powered Semantic Search for Policy Docs

Employees find the right clause in seconds — embeddings refresh on every policy publish, with access control at query time.

Project overview

Problem

Keyword search failed across PDFs and wiki pages; legal waited on ops to forward the correct attachment.

Solution

Chunked ingestion with deterministic IDs, nightly embedding refresh pipelines, and hybrid retrieval reranked for policy freshness.

Key metrics

4.2M
Chunks indexed
<350ms
p95 query latency
11
Regional namespaces
RBAC
Enforced per query

System architecture

Pinecone namespaces per division; ACL metadata filters applied server-side before LLM summarization.

Workflow

  • Ingest

    Connectors for SharePoint + Confluence with checksum dedupe.

  • Embed

    Batch jobs with backoff; poison documents quarantined automatically.

  • Serve

    Answer cards show clause IDs for audit-friendly references.

  • Feedback

    Thumbs-down routes to content owners with suggested edits.

Results & impact

Underwriters stopped mailing PDFs internally—findability became as reliable as the ERP for numbers.

Deeper dive

Deeper dive

Embedding version bumps shipped behind a flag so teams could A/B perplexity vs old index without user-visible downtime.