Which vertical-specific business processes are genuinely automatable today, and which are not?
The conversation about industry automation tends toward extremes: either AI will automate everything, or it cannot be trusted with any consequential task. The reality is more nuanced and more useful: there are specific, identifiable characteristics of business processes that determine whether automation is reliable today, not reliable but improving, or fundamentally unsuited to current AI capabilities.
This research stream investigates automation feasibility at the process level across four primary verticals: fintech compliance, legal document processing, operations orchestration, and customer communication. We pick hard problems in specific industries, build automation prototypes, test them against real workloads, and publish what we find — including where we expected automation to work and it did not.
The most consistent finding across all verticals is that the technical problems of automation are usually easier than anticipated, and the surrounding problems — liability, explainability, data quality, change management — are usually harder. An AI that processes documents with 94% accuracy on a clean corpus processes them with 61% accuracy on the messy, non-standardised documents that make up the majority of real business archives.
We work with clients in Studios verticals to run these experiments on real data with appropriate safeguards, then publish generalised findings without client-identifiable information. The goal is to give other practitioners an honest map of the terrain before they invest in automation projects.
We tested document extraction across a corpus of 2,400 real business documents in 4 categories (invoices, contracts, reports, correspondence). On standardised, templated documents, extraction accuracy averaged 94%. On free-form documents without consistent structure, accuracy dropped to 61% using the same models and prompts. The gap was driven by missing field inference on non-standard layouts, not by OCR or parsing errors.
Documents classified by human reviewers as structured (templated) vs. semi-structured vs. free-form. Extraction accuracy measured by field-level match against human-extracted ground truth. Results broken out by layout type to surface the distribution, not just the average.
We analysed two client deployments of customer communication automation: one with mandatory human escalation for cases the AI flagged as uncertain, and one where the AI handled all cases without escalation. The no-escalation system handled 85% of inquiries correctly and 15% incorrectly. The customer satisfaction impact of the 15% errors was 4x the positive impact of the 85% successes — negative experiences in customer communication have asymmetric weight. Complaint volume tripled. The mandatory escalation system maintained flat customer satisfaction scores.
Customer satisfaction survey data from both deployments over 60 days, normalised by inquiry volume. Complaint rate as secondary metric. Results were consistent across two separate client contexts.
"The AI decided" is not an accepted audit trail in any regulated industry we have worked in. Compliance automation that does not produce an auditable decision log — traceable to specific rules, specific document sections, and specific model outputs — was rejected by compliance officers in all 4 fintech clients we worked with, regardless of accuracy. Explainability is a prerequisite, not a nice-to-have.
Structured interviews with compliance officers and legal counsel across 4 client organisations after presenting automation prototypes with varying levels of decision explainability. Decision to adopt was the dependent variable; explainability quality was the primary independent variable.
Across 6 operations automation projects, the automation technology itself was the primary bottleneck in only 1 case. In 5 cases, the primary blockers were data quality issues (inputs not in the expected format or quality range), integration problems (APIs that did not behave as documented), or change management resistance (employees routing around the automation). Technical readiness is necessary but not sufficient.
Post-project retrospectives on 6 operations automation engagements, classifying the primary cause of each project delay or failure. Classification performed by two analysts independently with disagreements resolved by discussion.
We run automation prototypes on real business data from consenting Studio clients, with all client-identifying information anonymised before publication. Real data is essential — synthetic datasets do not capture the messiness that determines whether automation is actually viable.
We measure automation rate (cases fully handled without human intervention), error rate, error impact, explainability rating, and time-to-process compared to manual baseline. We publish the full distribution, not just the average — a 90% automation rate that fails on the 10% most important cases is not a useful system.
We identify clearly when automation is not viable today and explain why. We do not publish cherry-picked results from ideal conditions. Every finding includes the conditions under which it holds and the conditions under which it breaks down.
Hypothesis: A customer service AI with high accuracy (>85% on test set) can operate without human escalation without degrading customer satisfaction.
Handled 85% of inquiries correctly. Complaint volume tripled. Customer satisfaction fell sharply.
Error rate and customer satisfaction impact are not linearly related in customer communication contexts. The damage from a wrongly handled customer complaint is asymmetrically high. Human escalation paths are now mandatory in all customer communication automation we deploy, regardless of accuracy metrics.
Hypothesis: AI can automate end-to-end invoice processing (receipt → extraction → validation → approval routing) for standard invoice formats with >95% accuracy.
96.2% accuracy on standard invoice formats. Processing time reduced from 4 hours to 8 minutes per batch. Human review queue reduced by 78%.
Invoice processing automation is highly effective for standardised formats. The 3.8% that required human review were consistently non-standard layouts, damaged documents, or invoices with disputed amounts — exactly the cases that benefit most from human judgment.
Hypothesis: AI can extract standard contract clauses and flag non-standard terms with accuracy sufficient to reduce preliminary legal review time by 50%.
Standard clause extraction at 88% accuracy. Non-standard term flagging at 71% — too low for legal use without human verification of every flagged item.
Contract AI is useful as a first-pass filter that reduces the volume of material requiring human attention, but not as a replacement for human review on consequential determinations. We now position it as a human-acceleration tool, not an autonomous review system.