Deeper dive
This engagement leaned on repeatable evaluation sets and shadow mode before any agent gained write access to production labels.
A hub-and-spoke labeling system where orchestrator agents route edge cases to human review — accuracy up, throughput steady, and the factory never sleeps.
NorthGrid needed to scale review of edge-case labels without exploding headcount. Rules-based automation missed nuance; fully manual review could not keep pace with new SKUs and seasonal imagery.
We designed an agentic pipeline: lightweight classifiers for the bulk path, orchestrator agents that escalate uncertain items, and a human-in-the-loop queue only where confidence bands demand it. The AI Factory shipped iterative prompts, evaluation harnesses, and rollback-safe deploys.
A hub orchestrator fans work to specialist agents (blur detection, boundary disagreement, taxonomy drift). Each agent emits structured verdicts and confidence scores. Disagreements route to a merge policy — not a single model — so the system stays observable and tunable per client tenant.
Batches land in object storage; metadata normalizes camera and SKU context before any model runs.
High-confidence slices auto-accept; metrics stream to Grafana for drift alerts.
Orchestrator spins two specialist passes; on mismatch, a third arbiter agent proposes a resolution or flags human review.
Accepted labels feed retraining windows; rejected items become few-shot gold for the next sprint.
The client moved from brittle scripts to a monitored factory floor: fewer escalations, faster onboarding for new label taxonomies, and a clear audit trail for compliance. Throughput targets held while accuracy climbed into production SLAs.
This engagement leaned on repeatable evaluation sets and shadow mode before any agent gained write access to production labels.