Back to blog

From ADB Shell to AI Agent: The Quiet Revolution in Mobile Automation

12 min read
IndustryHistory

Mobile test automation has a longer history than most developers realize, and the AI-driven approach I'm exploring with Drengr sits at the end of a progression that started with raw ADB shell commands in 2009. Understanding that progression matters — not because history is inherently interesting (though I think it is), but because each generation solved real problems while creating new ones. Every mobile automation tool, including mine, is a response to the limitations of what came before. Knowing those limitations helps evaluate what's genuinely new and what's just repackaging.

The ADB Era (2009-2012)

Android Debug Bridge shipped with the Android SDK, and it included a deceptively simple capability: adb shell input. You could inject taps, swipes, and key events from a terminal.

adb shell input tap 500 300
adb shell input text "hello"
adb shell input swipe 500 1500 500 500 300

Developers wrote bash scripts that chained these commands together. Open the app, wait 2 seconds, tap the login button at coordinates (340, 780), type the username, tap the next field at (340, 860), type the password.

The problems were immediate and severe. Coordinates were absolute pixels. A script written for a 1080p phone broke on a 720p phone. A script written for one app version broke when the developer moved a button 50 pixels down. There was no way to query the UI state — you sent commands blind and hoped for the best.

But ADB shell automation proved something important: developers wanted to automate mobile testing, even with terrible tools. The demand was real.

UIAutomator and Espresso (2012-2015)

Google responded with proper frameworks. UIAutomator provided black-box testing — you could find elements by resource ID, text, or description, rather than coordinates. Espresso provided white-box testing for Android — fast, deterministic tests that ran inside the app process.

These were real, production-quality tools. Espresso, in particular, is excellent. Its automatic synchronization with the UI thread eliminates an entire category of flaky tests. If you're doing Android-only testing with access to the source code, Espresso remains hard to beat in 2026.

The limitations: both are Android-only, language-locked to Java or Kotlin, and require compilation against the app. You can't use Espresso to test someone else's app. You can't use UIAutomator for iOS. And for teams building cross-platform products, maintaining separate test suites for each platform is expensive.

Appium's Universal Vision (2013-2020)

Appium had an ambitious idea: apply the WebDriver protocol — the same standard that powered Selenium for web testing — to mobile devices. Write tests in any language. Run them against any platform. One API to rule them all.

The vision was compelling, and Appium built a real foundation. It proved that cross-platform mobile testing was possible. Major companies adopted it. A huge ecosystem of plugins, drivers, and integrations grew around it.

But the architecture carried inherent weight. Appium runs a Node.js server that translates WebDriver commands into platform-specific actions through a chain of drivers. Setting up Appium meant installing Node.js, Java (for the Android driver), the appropriate SDKs, and getting all the versions to align. Session management was fragile. Tests that passed on one Appium version broke on the next. "Flaky tests" became almost synonymous with mobile automation in many teams.

Appium built the foundation. I want to be clear about that — a lot of what exists today in mobile automation stands on Appium's groundwork.

Maestro's Simplification (2022-2024)

Maestro, from mobile.dev, asked a sharp question: what if mobile testing was actually simple? Their answer was YAML-based test flows that you could write in minutes.

appId: com.example.app
---
- launchApp
- tapOn: "Sign In"
- inputText:
    text: "user@example.com"
    id: "email_field"
- tapOn: "Continue"
- assertVisible: "Welcome back"

Five-minute setup. No WebDriver server. No driver management. Just a CLI that talked directly to the device. Maestro proved that developer UX matters in testing tools — that a tool people actually enjoy using gets adopted, even if it has fewer features than the heavyweight alternative.

What Maestro didn't change: you still wrote every test manually. Every flow, every assertion, every edge case had to be authored by a human who understood the app. The tool was simpler, but the work was the same.

The AI Shift (2024-2026)

Two things happened in 2024-2025 that opened a genuinely new direction for mobile automation.

First, multimodal LLMs became good enough to reliably interpret screenshots. Not perfectly — I've written about the limitations — but well enough to identify buttons, text fields, navigation elements, and app state from a screenshot alone. The agent could see.

Second, Anthropic published the Model Context Protocol. MCP gave those capable-but-isolated LLMs a standard way to discover and invoke external tools. An AI model could now say "I want to tap element 5 on this screen" and have that intention reliably translated into a device action through a well-defined protocol.

These two ingredients — vision and tool use — are what make AI-driven mobile testing possible. Not just theoretically possible, but practically achievable by a solo developer building in Rust on weekends.

Where I Think This Is Heading

The progression I see is from imperative to declarative to goal-oriented.

ADB was imperative: tap here, swipe there, type this. Espresso was declarative: find this element, verify this state. Maestro was declarative with better DX.

Drengr is my attempt at goal-oriented: "verify that a user can sign up, log in, and post a message." The agent figures out the how. It adapts to the specific app. It handles UI variations and unexpected states. You describe what should work, not how to test it.

I'm not claiming this is solved. The previous sections of this blog document the limitations in detail. But I do believe the direction is correct: AI agents that explore apps like humans do, finding bugs through curiosity rather than scripts.

Comparison

This is my honest assessment of the current landscape. I've tried to be fair — every tool on this list solves real problems for real teams.

Feature Appium Maestro Detox Espresso / XCUITest Drengr
Setup complexity High Low Medium Medium Minimal
Cross-platform Yes Yes React Native only No (platform-specific) Yes
AI-driven No No No No Yes
Script-free testing No No No No Yes
Single binary install No Yes No N/A (built-in) Yes
MCP support No No No No Native
Deterministic results Mostly Yes Yes Yes No (AI-dependent)
Test authoring effort High Low Medium Medium-High Minimal
Maturity Very mature Mature Mature Very mature Early

I want to call attention to the "Deterministic results" row. Drengr is the only "No" in that column, and that matters. When you run an Espresso test, you get the same result every time. When you run a Drengr exploration, you might get different paths, different findings, different coverage. That's a feature for exploratory testing and a limitation for regression testing. Both are valid use cases; the right tool depends on what you need.

Appium built the foundation. Maestro proved that developer UX matters. I built Drengr because I saw a gap: what if the test itself was intelligent?

Whether that intelligence proves more valuable than determinism in practice is still an open question. I have early evidence that it is, for certain types of testing. But I'd rather present the question honestly than claim to have answered it definitively.