The OODA Loop: How I Taught an AI Agent to Navigate Apps It Has Never Seen
I spent weeks watching Claude get stuck in loops — tapping the same button, scrolling endlessly, confused by loading screens. The AI was intelligent. It could reason about what it was seeing. But it had no memory of what had changed, no awareness of whether its actions were having any effect, and no framework for deciding when to try something different. The AI agent mobile testing problem wasn't intelligence. It was situational awareness.
The solution came from an unexpected place: a military strategist named John Boyd who never published a single paper. His OODA loop — Observe, Orient, Decide, Act — gave me the architecture I needed for autonomous mobile testing that actually works.
What OODA Is
Colonel John Boyd developed the OODA loop in the 1960s to explain why certain fighter pilots consistently won dogfights. His insight was that combat isn't about speed or firepower alone — it's about decision cycles. The pilot who observes the situation, orients to its meaning, decides on an action, and acts on that decision faster and more accurately than the opponent wins.
The four phases:
- Observe — Gather raw information from the environment
- Orient — Interpret that information in context. What changed? What does it mean? This is the critical phase — Boyd called it the schwerpunkt of the loop
- Decide — Choose an action based on the oriented understanding
- Act — Execute the chosen action
Most people focus on the speed of the loop. Boyd's real insight was that the Orient phase — the ability to make sense of what you're observing — is what separates effective actors from fast-but-blind ones.
Why It Maps Perfectly to AI Agents
When I looked at existing AI agent frameworks, most of them implemented something like: see screen, decide action, take action, repeat. That's an ODA loop — no Orient step. And that's exactly why agents get stuck.
Without orientation, an AI agent can't answer basic questions: Did my last action work? Is the app in a new state or the same state? Am I making progress or going in circles? Is this a loading screen or a dead end?
Drengr's OODA implementation adds the missing Orient phase through what I call the situation engine.
The Situation Engine
After every Observe phase (calling drengr_look), the situation engine compares the current state to the previous state and produces a situation report. Here's what it tracks:
- screen_changed — Did the visual content change at all? A boolean, but a critical one. If the agent tapped a button and the screen didn't change, something went wrong.
- activity_changed — Did the Android Activity or iOS ViewController change? This signals navigation — the agent moved to a new screen, not just scrolled within one.
- new_elements — What UI elements appeared that weren't there before? New elements often mean a new context to interact with.
- disappeared_elements — What elements vanished? A disappeared dialog means it was dismissed. A disappeared list item might mean a delete action worked.
- crash_detected — On Android, Drengr monitors logcat for fatal exceptions. If the app crashes, the agent needs to know immediately rather than staring at a "Unfortunately, app has stopped" dialog trying to interact with it.
- stuck_detected — If the last N screenshots are identical (pixel-compared) despite the agent taking actions, the situation engine flags a stuck state. This is the single most valuable detection — it catches infinite loops before they burn through tokens.
This situation report is the Orient phase. It transforms raw screen data into meaning.
A Real Session, Step by Step
I asked the agent: "Open YouTube and find a video about MCP." Here's a simplified trace of what happened internally:
Cycle 1:
- Observe: Home screen. 22 elements detected.
- Orient: Initial state. No previous screen to compare.
- Decide: Find and tap YouTube icon.
- Act:
drengr_do tap element:14
Cycle 2:
- Observe: YouTube home feed. 47 elements detected.
- Orient:
screen_changed: true, activity_changed: true, new_elements: 38. Navigation successful — new activity, many new elements. - Decide: Find search functionality.
- Act:
drengr_do tap element:2(search icon)
Cycle 3:
- Observe: Search screen with keyboard visible. 31 elements.
- Orient:
screen_changed: true, activity_changed: false, new_elements: 15. Same activity but new context (search overlay). Keyboard detected. - Decide: Type search query.
- Act:
drengr_do type "Model Context Protocol MCP"
Cycle 4:
- Observe: Search suggestions appearing. 12 suggestions visible.
- Orient:
screen_changed: true, new_elements: 12. Autocomplete suggestions loaded. Text input confirmed. - Decide: Submit search.
- Act:
drengr_do press enter
Cycle 5:
- Observe: Search results page. 8 video results visible.
- Orient:
screen_changed: true, activity_changed: true, new_elements: 34. New screen with results. Goal is within reach. - Decide: Tap first relevant result.
- Act:
drengr_do tap element:7
Five cycles, about 35 seconds total. The Orient phase added maybe 100ms per cycle in computation but saved the agent from every wrong turn it would have otherwise taken.
When It Breaks
The OODA loop is not magic. It breaks in specific, instructive ways.
Infinite scroll. The agent is asked to "find the oldest post." It scrolls down. New content loads. The situation engine reports screen_changed: true, new_elements: 5. The agent scrolls again. More new content. The agent has no concept of "I've been scrolling for 30 screens and this might never end." I've partially addressed this with a configurable action limit, but the fundamental problem — the agent can't distinguish "more content below" from "I should try a different approach" — remains open.
Loading spinners. The agent taps a button. A spinner appears. The situation engine sees screen_changed: true because the spinner is new. But the real content hasn't loaded yet. The agent, seeing a changed screen, tries to interact with it. I've added a heuristic — if the only new elements are progress indicators, wait and re-observe — but heuristics are fragile.
Time-based animations. Carousels that auto-advance, toast messages that appear and disappear, animated transitions. The screen is constantly changing, so screen_changed is always true, and the situation engine struggles to distinguish meaningful changes from decorative motion.
What This Taught Me About AI Agency
The gap between "intelligent" and "effective" is situation awareness. Claude is remarkably intelligent — it can reason about UI elements, infer app structure from screenshots, and plan multi-step interactions. But without the Orient step, it's powerful and blind.
Intelligence without situational awareness is like a chess grandmaster who can calculate 20 moves ahead but can't see the board.
Boyd understood this in the context of aerial combat. The pilot who can calculate optimal trajectories but doesn't notice the enemy on their six o'clock is dead. The agent that can reason about UI hierarchies but doesn't notice it's been stuck on the same screen for 10 cycles is burning tokens.
I think this insight generalizes beyond mobile testing. Every AI agent framework I've seen would benefit from an explicit Orient phase — a structured comparison between "what I expected to happen" and "what actually happened." The agents that work well in the real world won't just be the smartest ones. They'll be the ones that notice when something isn't working.
The Broader Research Question
What I'm really exploring with Drengr is whether the OODA loop is a sufficient general framework for AI agents operating in visual, interactive environments. Mobile apps are a specific case, but the pattern — observe a GUI, orient to changes, decide on an action, act and loop — applies to desktop automation, web testing, game playing, and any other domain where an agent interacts with a visual interface.
I don't have a definitive answer yet. But the early results suggest that adding structured situational awareness to AI agents is a more productive direction than simply making the underlying model more capable. A smarter agent with no situation engine still gets stuck. A less capable agent with a good situation engine often completes the task.
That's a result worth investigating further.