Case studySoftware FactoryMeridian Capital

AWS Lakehouse Pipeline for Finance Analytics

Bronze–silver–gold on S3 and Athena — analysts query curated tables while raw lands immutable for regulators.

Project overview

Problem

Spreadsheet pivots and one-off Python jobs meant numbers in board decks disagreed with ops dashboards.

Solution

Lakehouse layers with dbt-style transforms in Spark on EMR, orchestrated schedules, and BI hitting only gold tables.

Key metrics

18TB
Curated lake storage
6h
Freshness SLA
100%
Lineage columns tracked
0
Manual CSV hops

System architecture

EventBridge triggers; Iceberg tables for time travel; column-level encryption for PII; cross-account read roles for auditors.

Workflow

  • Land raw

    Immutable bronze with ingestion timestamps and source hashes.

  • Conform

    Silver merges ERP and CRM keys with survivorship rules documented.

  • Aggregate

    Gold exposes KPIs with semantic naming agreed by finance + eng.

  • Serve

    Athena views + controlled extracts to approved BI workspaces.

Results & impact

Finance stopped reconciling three spreadsheets—the factory delivered one metric definition and provable lineage.

Deeper dive

Deeper dive

Partition pruning and bucketed writes kept query spend flat even when raw ingest doubled during acquisition integration.