2026-06-07 - Carpe Diem

# Building an Autonomous Pentesting Agent: CAPTCHAs Were Not the Real Problem > 2026-06-07 · notes from building the autonomous security agent I have been building the registration layer of my autonomous pentesting agent (continuing from [[2026-06-06]]). Initially I assumed registration would be straightforward: find the signup flow, create an account, solve the CAPTCHA if necessary, continue testing. ![[Pasted image 20260607145654.png]] That assumption lasted about two days. The deeper I went into real-world applications, the more I realized registration is not a single problem — it is a collection of many smaller problems that vary significantly from one application to another. And surprisingly, CAPTCHAs were not the hardest part. ## The registration problem I tested registration workflows across multiple Vulnerability Disclosure Programs (VDPs) and bug bounty targets. The goal was simple: let the agent autonomously create accounts whenever a program explicitly permits registration and testing. On paper, trivial. In practice, I hit: reCAPTCHA v2, Cloudflare Turnstile, hCaptcha, invisible CAPTCHAs, OAuth-only flows, dynamic identity providers, browser-rendered signup forms, email verification requirements, anti-automation protections, and JavaScript-heavy flows. Every application introduced a new variation. Some challenges appeared on page load; others only after submission. Some delegated registration entirely to external identity providers. The result was an endless stream of edge cases. ## Solving CAPTCHAs My first instinct was the obvious one: find a CAPTCHA solver, integrate it, move on. After evaluating several providers I integrated **CapSolver**, and the results were impressive — it solved multiple challenge types and significantly increased registration success rates. The agent could reach the registration page, complete forms, solve the challenge, submit, and continue. At first glance, solved. But after dozens of tests I noticed something: every time I solved one category of challenge, another appeared. **CAPTCHA solving wasn't removing complexity — it was moving it elsewhere.** ## The real problem Eventually I realized I was asking the wrong question. I was asking *"how do I automate registration?"* The better question was *"how much registration should I automate?"* Registration workflows are not static. Every app brings different anti-automation mechanisms, business rules, verification flows, identity providers, and assumptions about user behavior. I could keep pouring engineering time into ever-more-complex automation — or recognize that some parts simply benefit from human assistance. Not because automation is impossible, but because the return on investment becomes questionable. ## Designing a layered registration architecture That led to a redesign. Instead of pursuing fully autonomous registration at all costs, I adopted a **layered architecture**: 1. **API registration** — the agent first attempts direct registration through exposed APIs. When available, this is the most reliable and deterministic path. 2. **Browser registration** — if no API, it attempts browser-based registration with Playwright, handling JavaScript-heavy apps and modern frontends. 3. **CAPTCHA solver** — if a challenge appears, it attempts automated solving via CapSolver. For many targets, sufficient. 4. **Human-in-the-loop** — if everything above fails, the system transitions into a dedicated human-assisted state. Not as a fallback. Not as an error. As a first-class architectural component. ## Human-in-the-loop as a system state This was probably the most important design decision. When human intervention becomes necessary, the agent opens a controlled Playwright session — and the heavy lifting is already done: credentials generated, an email address created, the workflow prepared, and the exact point requiring intervention reached. To support this I also built a dedicated registration **email infrastructure** on a custom domain I control, instead of public temporary-email providers (which are frequently blocked or flagged). This dramatically improves reliability while keeping automation. At that point the human only performs the specific action requiring judgment: completing a difficult CAPTCHA, handling an unexpected verification flow, approving a login step, or navigating a provider-specific identity challenge. Once done, the agent either auto-detects success (via cookies, redirects, or state transitions) or asks for confirmation — then execution resumes automatically. ## The architectural lesson One lesson keeps reappearing: **human-in-the-loop is not a limitation — it is an architectural requirement, at least today.** The best agents aren't the ones that automate everything; they're the ones that understand when automation creates value and when human assistance is simply more efficient. A fully autonomous system that spends ten minutes fighting edge cases is often less useful than one that asks for thirty seconds of human help and continues predictably. As engineers we often treat human intervention as failure. I increasingly believe that's the wrong mental model. Human intervention is just another tool — the real challenge is deciding *when* to use it. For registration workflows, that decision turned out to matter far more than solving CAPTCHAs. ## What's next This registration layer is only one component of the larger system. In future posts I plan to share: registration success metrics, CAPTCHA challenge distribution, CapSolver cost analysis, the email-verification architecture, identity-provider handling, lessons from real VDP and bug bounty programs, and more architectural decisions. Because one thing is clear: building autonomous security agents is far less about choosing the right model, and far more about designing the right architecture. --- ## Concepts [[Registration]] · [[CAPTCHA]] · [[Authentication]] · [[Browser automation]] · [[Agent architecture]] · [[Human-in-the-loop]] · [[Email verification]] · [[Bug bounty]] **Previous:** [[2026-06-06]] · **Next:** [[2026-06-10]]