AI × Mobile Research & Tooling

Eyes and Hands for AI Agents on Mobile Devices

Drengr is a research-driven platform that gives AI agents the ability to see, understand, and interact with Android and iOS devices. Built as a single Rust binary with MCP server support, it works with Claude Desktop, Cursor, Windsurf, and any MCP client — but the vision goes far beyond a protocol.

Installcurl -fsSL https://drengr.dev/install.sh | bash
or via npmnpm install -g drengr
Verifydrengr doctor

Works on macOS (Apple Silicon & Intel) and Linux. Requires ADB or Xcode for device access.

See it in action

See Drengr in Action

Real demos on a real device. No scripts, no rehearsals. The AI agent navigates YouTube autonomously, adapting to obstacles in real-time.

Self-Evolving Agent

3:17

The agent hits a wall -- YouTube's custom renderer blocks the UI tree, returning 0 elements. Instead of failing, it evolves: pauses the video with KEYCODE_MEDIA_PAUSE to stop animations, which lets uiautomator dump the full UI tree. 29 elements recovered. It then double-taps to skip forward, expands the description, opens Key Concepts, and taps the first concept card.

Self-Evolution

When standard approaches fail, the agent adapts its strategy in real-time. No hardcoded fallbacks -- genuine problem-solving.

Shorts Navigation

1:32

The agent navigates to YouTube Shorts and swipes through multiple short-form videos -- tech content, motivation, a Mandalorian clip, comedy -- demonstrating fluid swipe interactions on a fast-scrolling feed.

Gesture Control

Full swipe, scroll, and gesture support. Drengr handles the kinetic interactions that make mobile apps unique.

How It Works

Drengr exposes 3 MCP tools. The LLM is the brain, Drengr is the body. Each tool call is a single JSON-RPC message over stdio.

1
OBSERVE
drengr_look

Captures a screenshot and parses the UI tree. Returns numbered, interactive elements with their text, type, and bounds.

// AI agent calls drengr_look
{
  "screen": "YouTube - Home",
  "elements": [
    "[1] Search (EditText)",
    "[2] Trending (Tab)",
    "[3] 'MCP Server Explained' (Video, 497K views)",
    "[4] IBM Technology (Channel)",
    // ...26 elements
  ]
}
2
ORIENT
Situation Engine

Tracks what changed since the last action: new elements, disappeared elements, screen transitions, crashes, stuck detection.

// Situation report (automatic)
{
  "screen_changed": true,
  "activity": "com.youtube.app/.HomeActivity",
  "new_elements": ["[3] Video card", "[4] Channel"],
  "disappeared": [],
  "crash_detected": false,
  "stuck": false
}
3
DECIDE + ACT
drengr_do

The AI decides the next action and Drengr executes it: tap, type, swipe, scroll, press keys, go back, go home.

// AI agent calls drengr_do
{
  "action": "tap",
  "element": 3  // Tap video [3]
}

// Or more complex actions:
{ "action": "type", "element": 1, "text": "mcp server" }
{ "action": "swipe", "direction": "up" }
{ "action": "key", "code": "KEYCODE_MEDIA_PAUSE" }
4
QUERY
drengr_query

Read-only queries without side effects. Check element state, read text content, verify screen conditions.

// AI agent calls drengr_query
{
  "query": "text_content",
  "element": 5
}
// Returns: "The Mandalorian and Grogu | Official Trailer"

{
  "query": "element_state",
  "element": 2
}
// Returns: { "enabled": true, "selected": false }

Works With Any MCP Client

Drengr uses the Model Context Protocol (MCP) -- the open standard for connecting AI agents to tools. Add one JSON config and every MCP client gains mobile device control.

claude_desktop_config.json
{
  "mcpServers": {
    "drengr": {
      "command": "/usr/local/bin/drengr",
      "args": ["mcp"],
      "env": {
        "ANDROID_HOME": "/your/android/sdk"
      }
    }
  }
}

Run drengr setup --client {name} to auto-generate this config with your actual paths and ANDROID_HOME.

observe → act loop
>drengr_look { "device": "emulator-5554" }
$Screen: "YouTube - Home" | 26 elements
$[1] Search [2] Trending [3] Video: "MCP Explained"
>drengr_do { "action": "tap", "element": 3 }
$OK - tapped element [3]
>drengr_look { "device": "emulator-5554" }
$Screen: "YouTube - Player" | screen_changed: true
$[1] Back [2] Like [3] Subscribe [4] Share

Features

Research-driven tools for AI-device interaction. Each capability was born from real experimentation with how AI agents perceive and navigate mobile interfaces.

3 MCP Tools

drengr_look to observe, drengr_do to act, drengr_query to read. Three tools cover the entire mobile interaction surface.

Situation Engine

Tracks screen_changed, activity_changed, crash detection, stuck detection, and element diffs between actions.

Text-Only Mode

Compact text scene uses ~300 tokens vs ~100KB for screenshots. 100x cheaper. Vision only escalates when needed.

Single Binary

Written in Rust. Compiles to a single static binary with no runtime dependencies. LTO-optimized for size.

OODA Loop

Built-in autonomous agent mode. Observe-Orient-Decide-Act loop works with any LLM provider: OpenAI, Gemini, Anthropic, Groq, and more.

Coming Soon

Actively in development. These features are planned but not yet shipped.

Coming Soon

Multi-Platform

Android via ADB, iOS via simctl, cloud devices via Appium. One interface for every device type.

Coming Soon

Cloud Devices

First-class support for BrowserStack and Sauce Labs via Appium WebDriver. Test on real cloud devices.

Coming Soon

SDK

Android (Kotlin) and iOS (Swift) SDKs for in-app network event interception. Correlate API calls with screen actions.

Coming Soon

Explore Mode

BFS app exploration that automatically taps every interactive element and builds a navigation graph.

Coming Soon

CI/CD Ready

JUnit XML output via drengr test --format junit. Drop into any CI pipeline — GitHub Actions, GitLab CI, Jenkins.

Coming Soon

YAML Test Suites

Define test scenarios in YAML. Run them with drengr test. Outputs human-readable or JSON results.

Coming Soon

Drengr Dashboard

Real-time run viewer, API correlation, AI insights, and trends. A Next.js app currently in development.

Coming Soon

Real-time Steering

Pause and resume agents from the dashboard. Override prompts mid-run. Full human-in-the-loop control.

Coming Soon

Multi-Consumer Tracking

See which MCP client is controlling the device — Claude Desktop, Cursor, Windsurf, or Claude Code.

Coming Soon

Network Monitoring

Correlate API calls with screen actions. See network traffic in HAR format alongside the agent's UI interactions.