Google’s Threat Intelligence Group disclosed on 11 May 2026 that it had detected and disrupted a planned mass exploitation campaign built around a zero-day vulnerability in a widely deployed open-source administration tool. The exploit — a Python script designed to bypass two-factor authentication — had been written by a large language model. It is the first confirmed case of AI-generated code being weaponised for a live exploitation campaign.
The disclosure marks a meaningful threshold in the industrialisation of offensive cyber capability. LLM-assisted vulnerability research has been a theoretical concern for several years. This is the first publicly confirmed case of that concern becoming operational reality.
The Exploit
The target was a two-factor authentication implementation in an open-source administrative interface with significant deployment across enterprise environments. The flaw allowed an attacker to bypass the 2FA check entirely by manipulating a specific sequence of authentication requests — a logic flaw rather than a memory corruption issue.
The exploit code was a clean Python script, well-structured and readable. Google’s analysts identified several characteristics that pointed strongly to LLM authorship rather than a human developer:
Educational docstrings. The code included inline comments explaining what each block was doing in a style consistent with LLM output designed to appear instructional — not the terse, idiosyncratic comments a human exploit developer typically writes.
Hallucinated CVSS score. The script included a header comment citing a CVSS score that did not correspond to any published CVE. The score appeared to have been generated by the model as part of its framing rather than sourced from a real advisory.
Uniform code style. Human-authored exploit code typically reflects the author’s habits and shortcuts. This script was conspicuously consistent — indentation, variable naming, error handling all aligned with LLM output patterns rather than organic development.
Functional correctness. Despite the LLM fingerprints, the exploit worked. The 2FA bypass logic was technically sound and had been validated against a live target before the campaign was detected.
The Planned Campaign
Google’s intelligence indicated the threat actor intended to deploy the exploit at scale — not targeted intrusions but automated scanning and exploitation across internet-exposed instances of the admin tool. The number of potentially vulnerable targets in scope ran to tens of thousands of deployments across sectors including healthcare, financial services, and critical infrastructure management.
The detection came through a combination of threat intelligence collection, monitoring of underground forums, and analysis of infrastructure the group had begun staging for the campaign. Google coordinated with the affected software’s maintainers to develop and release a patch before the campaign could launch.
The patch was released as part of an emergency update on 10 May. Exploitation was thwarted, but the window between the group’s acquisition of a working exploit and its detection was narrow.
Why This Matters
The significance is not that AI wrote a zero-day. Researchers have demonstrated LLM-assisted vulnerability discovery in controlled settings for several years. The significance is that a threat actor operationalised it — used AI tooling to develop a functional exploit and incorporated it into a live campaign.
The barriers to entry for exploit development have historically been meaningful. Writing a reliable 2FA bypass requires understanding authentication flows, being able to test against a live implementation, and having the development skills to produce deployable code. LLMs compress the time and skill required for the first and last of these.
The result is not that every threat actor now has zero-day capability. But the population of actors capable of producing functional exploits — particularly for logic flaws in web applications and authentication systems — has expanded. The labour cost of custom tooling has dropped.
The hallucinated CVSS score is a useful operational detail: it provides a detection marker for LLM-authored code in future triage. Defenders analysing post-incident artefacts should add LLM fingerprint analysis to their tooling assessment process.
Recommended Actions
- Verify the patch is applied. If your organisation runs the affected admin tool, confirm you are on the patched version (released 10 May). Given the scale of the planned campaign, assume active scanning is ongoing.
- Audit 2FA implementations. Review whether any other internally deployed tools have 2FA implementations that have not been independently assessed. Logic flaws in auth flows are common and frequently underscrutinised.
- Update threat models. The assumption that only well-resourced APT groups produce novel exploit code should be retired. LLM-assisted exploit development is now a demonstrated capability for a broader actor set.
- Add LLM fingerprint checks to IR processes. When analysing attacker tooling post-incident, include assessment for LLM authorship characteristics — educational comments, hallucinated citations, unnaturally consistent code style.