Industry Automation

ACTIVE3 articles published5 experiments logged

Which vertical-specific business processes are genuinely automatable today, and which are not?

The conversation about industry automation tends toward extremes: either AI will automate everything, or it cannot be trusted with any consequential task. The reality is more nuanced and more useful: there are specific, identifiable characteristics of business processes that determine whether automation is reliable today, not reliable but improving, or fundamentally unsuited to current AI capabilities.

This research stream investigates automation feasibility at the process level across four primary verticals: fintech compliance, legal document processing, operations orchestration, and customer communication. We pick hard problems in specific industries, build automation prototypes, test them against real workloads, and publish what we find — including where we expected automation to work and it did not.

The most consistent finding across all verticals is that the technical problems of automation are usually easier than anticipated, and the surrounding problems — liability, explainability, data quality, change management — are usually harder. An AI that processes documents with 94% accuracy on a clean corpus processes them with 61% accuracy on the messy, non-standardised documents that make up the majority of real business archives.

We work with clients in Studios verticals to run these experiments on real data with appropriate safeguards, then publish generalised findings without client-identifiable information. The goal is to give other practitioners an honest map of the terrain before they invest in automation projects.

Published findings

Research articles

Fintech Compliance Automation: What We Learned After 6 Months

We spent six months automating compliance workflows for fintech clients. The technical problems were easier than expected. The liability questions were harder.

fintechcomplianceautomationDecember 202514 min read

Document Extraction at Scale: The 94% to 61% Drop

Our document extraction pipelines hit 94% accuracy on clean documents and 61% on free-form documents in the same corpus. Here is what drives that gap and how to close it.

document-extractionaccuracyproductionNovember 202511 min read

Where Customer Communication Automation Works and Where It Does Not

We analysed two client deployments with and without human escalation paths. The results were unambiguous and the mechanism was not what we expected.

customer-communicationautomationescalationOctober 20259 min read

What we know so far

Key findings

Evidence

We tested document extraction across a corpus of 2,400 real business documents in 4 categories (invoices, contracts, reports, correspondence). On standardised, templated documents, extraction accuracy averaged 94%. On free-form documents without consistent structure, accuracy dropped to 61% using the same models and prompts. The gap was driven by missing field inference on non-standard layouts, not by OCR or parsing errors.

Methodology

Documents classified by human reviewers as structured (templated) vs. semi-structured vs. free-form. Extraction accuracy measured by field-level match against human-extracted ground truth. Results broken out by layout type to surface the distribution, not just the average.

Evidence

We analysed two client deployments of customer communication automation: one with mandatory human escalation for cases the AI flagged as uncertain, and one where the AI handled all cases without escalation. The no-escalation system handled 85% of inquiries correctly and 15% incorrectly. The customer satisfaction impact of the 15% errors was 4x the positive impact of the 85% successes — negative experiences in customer communication have asymmetric weight. Complaint volume tripled. The mandatory escalation system maintained flat customer satisfaction scores.

Methodology

Customer satisfaction survey data from both deployments over 60 days, normalised by inquiry volume. Complaint rate as secondary metric. Results were consistent across two separate client contexts.

Evidence

"The AI decided" is not an accepted audit trail in any regulated industry we have worked in. Compliance automation that does not produce an auditable decision log — traceable to specific rules, specific document sections, and specific model outputs — was rejected by compliance officers in all 4 fintech clients we worked with, regardless of accuracy. Explainability is a prerequisite, not a nice-to-have.

Methodology

Structured interviews with compliance officers and legal counsel across 4 client organisations after presenting automation prototypes with varying levels of decision explainability. Decision to adopt was the dependent variable; explainability quality was the primary independent variable.

Evidence

Across 6 operations automation projects, the automation technology itself was the primary bottleneck in only 1 case. In 5 cases, the primary blockers were data quality issues (inputs not in the expected format or quality range), integration problems (APIs that did not behave as documented), or change management resistance (employees routing around the automation). Technical readiness is necessary but not sufficient.

Methodology

Post-project retrospectives on 6 operations automation engagements, classifying the primary cause of each project delay or failure. Classification performed by two analysts independently with disagreements resolved by discussion.

How we work

Methodology

We run automation prototypes on real business data from consenting Studio clients, with all client-identifying information anonymised before publication. Real data is essential — synthetic datasets do not capture the messiness that determines whether automation is actually viable.

Experiment types

Document processing accuracy benchmarking across clean vs. real-world document quality distributions
End-to-end automation rate measurement: what percentage of cases can be fully automated without human review
Error impact analysis: measuring the downstream business impact of automation errors, not just error rates
Explainability assessment: evaluating whether automation decisions can be explained to regulators, auditors, or end customers
Longitudinal drift monitoring: measuring whether automation accuracy degrades over time as business data patterns shift

How we measure

We measure automation rate (cases fully handled without human intervention), error rate, error impact, explainability rating, and time-to-process compared to manual baseline. We publish the full distribution, not just the average — a 90% automation rate that fails on the 10% most important cases is not a useful system.

Transparency commitment

We identify clearly when automation is not viable today and explain why. We do not publish cherry-picked results from ideal conditions. Every finding includes the conditions under which it holds and the conditions under which it breaks down.

Radical transparency

Experiment log — including the failures

Customer service automation without human escalation path

Disproven

Hypothesis: A customer service AI with high accuracy (>85% on test set) can operate without human escalation without degrading customer satisfaction.

Result

Handled 85% of inquiries correctly. Complaint volume tripled. Customer satisfaction fell sharply.

What we learned

Error rate and customer satisfaction impact are not linearly related in customer communication contexts. The damage from a wrongly handled customer complaint is asymmetrically high. Human escalation paths are now mandatory in all customer communication automation we deploy, regardless of accuracy metrics.

November 2025

Automated invoice processing for accounts payable

Confirmed

Hypothesis: AI can automate end-to-end invoice processing (receipt → extraction → validation → approval routing) for standard invoice formats with >95% accuracy.

Result

96.2% accuracy on standard invoice formats. Processing time reduced from 4 hours to 8 minutes per batch. Human review queue reduced by 78%.

What we learned

Invoice processing automation is highly effective for standardised formats. The 3.8% that required human review were consistently non-standard layouts, damaged documents, or invoices with disputed amounts — exactly the cases that benefit most from human judgment.

October 2025

Automated contract clause extraction and risk flagging

Partial result

Hypothesis: AI can extract standard contract clauses and flag non-standard terms with accuracy sufficient to reduce preliminary legal review time by 50%.

Result

Standard clause extraction at 88% accuracy. Non-standard term flagging at 71% — too low for legal use without human verification of every flagged item.

What we learned

Contract AI is useful as a first-pass filter that reduces the volume of material requiring human attention, but not as a replacement for human review on consequential determinations. We now position it as a human-acceleration tool, not an autonomous review system.

September 2025

Active investigations

What we are working on next

Longitudinal drift monitoring in document processing pipelines

Investigating whether document extraction accuracy degrades over time as business document formats evolve, and how to detect and respond to drift without requiring continuous human labelling.

Explainability standards for regulated industry automation

Working with compliance and legal partners to define minimum explainability requirements for AI-automated decisions in fintech and legal contexts — what an audit trail must contain to be acceptable.

Browse all experiments

Research streams

Related streams

Key findings

Evidence

Methodology

Evidence

Methodology

Customer satisfaction survey data from both deployments over 60 days, normalised by inquiry volume. Complaint rate as secondary metric. Results were consistent across two separate client contexts.

Evidence

Methodology

Evidence

Methodology

Experiment types

Document processing accuracy benchmarking across clean vs. real-world document quality distributions
End-to-end automation rate measurement: what percentage of cases can be fully automated without human review
Error impact analysis: measuring the downstream business impact of automation errors, not just error rates
Explainability assessment: evaluating whether automation decisions can be explained to regulators, auditors, or end customers
Longitudinal drift monitoring: measuring whether automation accuracy degrades over time as business data patterns shift

How we measure

Transparency commitment

Experiment log — including the failures

Customer service automation without human escalation path

Disproven

Hypothesis: A customer service AI with high accuracy (>85% on test set) can operate without human escalation without degrading customer satisfaction.

Result

Handled 85% of inquiries correctly. Complaint volume tripled. Customer satisfaction fell sharply.

What we learned

November 2025

Automated invoice processing for accounts payable

Confirmed

Hypothesis: AI can automate end-to-end invoice processing (receipt → extraction → validation → approval routing) for standard invoice formats with >95% accuracy.

Result

96.2% accuracy on standard invoice formats. Processing time reduced from 4 hours to 8 minutes per batch. Human review queue reduced by 78%.

What we learned

October 2025

Automated contract clause extraction and risk flagging

Partial result

Hypothesis: AI can extract standard contract clauses and flag non-standard terms with accuracy sufficient to reduce preliminary legal review time by 50%.

Result

Standard clause extraction at 88% accuracy. Non-standard term flagging at 71% — too low for legal use without human verification of every flagged item.

What we learned

September 2025

What we are working on next

Longitudinal drift monitoring in document processing pipelines

Investigating whether document extraction accuracy degrades over time as business document formats evolve, and how to detect and respond to drift without requiring continuous human labelling.

Explainability standards for regulated industry automation

Browse all experiments