AI in the Attack Chain: How Threat Actors Are Using Language Models Operationally

Stop asking whether AI will change offensive cyber operations. It already has.

Threat actors are using large language models operationally: across multiple stages of the attack chain, at multiple sophistication levels, with measurable effect on the speed, scale, and accessibility of attacks. The question worth asking now is which parts of your security posture were calibrated to a threat environment that no longer exists.

This analysis maps confirmed incidents, intelligence reporting, and technical indicators to produce an honest picture of where AI adoption by threat actors stands today, and where it’s heading.

Getting the Framing Right

Early commentary on AI in offensive operations divided into two camps that both missed the point. One dismissed LLM-assisted attacks as hype because the underlying techniques weren’t new. The other catastrophised with scenarios treating AI as an autonomous adversarial agent running entire campaigns independently.

Neither framing was useful. AI doesn’t need to do something categorically novel to matter strategically. It needs to make existing attack techniques faster, cheaper, or accessible to a broader actor population, and there is now substantial evidence it’s doing all three.

The useful frame is operational leverage. Where does AI, in its current form, meaningfully reduce the effort, skill, or time required to execute a specific attack stage? The answer maps onto specific parts of the attack lifecycle, and the implications vary by stage.

Reconnaissance: Speed and Synthesis

Reconnaissance is information-intensive, tedious, and historically dependent on human analysts who understood the target environment. LLMs have made this substantially more efficient for actors across the skill spectrum.

OSINT synthesis. Processing large volumes of publicly available information (LinkedIn profiles, company filings, news coverage, job postings) to build a coherent picture of an organisation’s structure, technologies, key personnel, and potential entry points is a task LLMs handle well. What previously required significant analyst time can now be completed in a fraction of it.

Technology identification from job postings. Job advertisements reliably expose technology stacks. An organisation seeking “engineers with Fortinet NSX experience” is advertising its perimeter technology. “OT/ICS specialists familiar with Siemens S7” reveals the industrial control environment. LLMs can parse hundreds of postings and extract a technology map of a target organisation with high fidelity. The target has told you what to attack; the LLM just reads the job board.

Social engineering research. Finding the context needed to make a phishing pretext credible (understanding reporting structures, identifying individuals with specific access, knowing the current organisational context) is a research task that LLMs accelerate substantially.

Confirmed use at this stage is documented across multiple groups. Microsoft’s 2025 Digital Defense Report noted evidence of Russia-linked actors using LLMs for target research. OpenAI’s February 2025 disruption report identified five nation-state actors from China, Iran, Russia, and North Korea using the platform for reconnaissance research before their accounts were terminated.

The primary effect is speed and scale. The threshold for a well-researched, highly contextualised attack has dropped. The assumption that a credible organisation-specific spear-phishing campaign requires significant adversary investment in research should be retired.

This is where the most visible and measurable AI impact has occurred. Quality, volume, and personalisation of phishing attacks have all increased. LLMs are a significant contributing factor.

For years, security awareness training relied on a practical filter: grammatical errors in phishing emails are a warning sign. Imperfect, certainly (sophisticated actors always produced clean copy), but effective against lower-end campaigns.

LLMs produce grammatically flawless text in any required register. The filter is gone. Organisations that haven’t updated their security awareness training to reflect this are training employees against a threat model that expired years ago.

The more significant development is personalisation at scale. Effective spear-phishing requires understanding the target’s role, relationships, and current context. An email that references a real colleague, a current project, a recent news event, and lands in exactly the right professional register is far more convincing than any generic credential-theft template. That level of personalisation was expensive: it required research time and a skilled author. With LLM-assisted drafting and OSINT synthesis, it isn’t expensive anymore.

An actor can produce hundreds of highly personalised pretext messages targeting employees across an organisation in the time it previously took to produce a handful. The scale has changed. Most security awareness programmes haven’t.

Multiple security firms documented significant increases in personalised phishing volume through 2025 and into 2026 that correlate with LLM adoption patterns. The VDBIR 2026 found that phishing as an initial access vector remained dominant and that the average quality of phishing content in successful breaches had increased, consistent with AI-assisted authoring.

AI voice synthesis adds another dimension. Threat actors have used cloned executive voices (produced from publicly available audio, conference recordings, earnings calls) in vishing attacks targeting finance teams to authorise fraudulent transfers. Most callback verification procedures were designed against the assumption that a caller claiming to be the CFO would sound like the CFO. That assumption is no longer safe.

Exploit Development: The May 2026 Threshold

The confirmed first AI-built zero-day, disclosed by Google in May 2026, moved this from research capability to operational reality. It warrants careful examination.

What Happened

Google’s Threat Intelligence Group documented an actor using an LLM to write a functional 2FA bypass for a widely deployed open-source admin tool. The exploit was correct, functional, and had been validated against a live target. The actor planned to use it in a mass exploitation campaign before detection.

This is the first confirmed case of AI-generated exploit code incorporated into an active operation. Prior documented cases (including OpenAI’s 2025 disclosures) involved threat actors using LLMs for research and drafting, not for producing working exploit artefacts. This crossed a line.

The significance is not that the vulnerability was exceptionally complex. It was a logic flaw in an authentication flow: the kind of bug a skilled human developer could find and exploit given time. The significance is that the actor used AI to close the gap between not having that skill and having it.

What the Fingerprints Tell Us

The LLM markers identified in the code are instructive: educational docstrings explaining what each block did in a style consistent with instructional LLM output, a hallucinated CVSS score that doesn’t correspond to any published CVE, and unnaturally consistent code style throughout (indentation, variable naming, error handling all conforming to LLM output patterns rather than organic development).

Despite all of that, the exploit worked. The 2FA bypass logic was technically sound.

These fingerprints are currently a useful detection signal in incident response analysis of attacker tooling. They have a finite shelf life: actors will learn to suppress them, or will fine-tune models on human-authored exploit code to strip the tells. But they’re there now.

The Capability Gap Has Narrowed, Not Closed

Historically, functional exploit development (particularly for non-trivial vulnerabilities) required specific skills that were not uniformly distributed across the threat actor population. Memory layouts, authentication flows, protocol internals, code that behaves predictably under adversarial conditions.

LLMs don’t eliminate this constraint entirely. Exploiting a complex modern kernel vulnerability still requires deep expertise that current models cannot substitute for. But for a significant class of vulnerabilities (logic flaws, authentication bypasses, injection issues in web applications) the effective skill threshold has dropped materially.

The practical result: the population of actors capable of producing functional exploits for this vulnerability class has expanded. By how much is unclear. But the May 2026 case is the proof of concept, and it’s now in the operational record.

Lateral Movement and Persistence: Still Largely Human-Driven

Navigating an enterprise network post-access (understanding the environment, avoiding detection, identifying high-value targets) requires adaptive decision-making that current LLMs don’t provide reliably on an autonomous basis. This stage remains more human-dependent than the earlier phases.

That said, LLM assistance is present in specific ways.

Operators use LLMs to generate custom scripts for lateral movement tasks, modifying off-the-shelf tooling to evade specific detection signatures, adapting PowerShell or Python to the specific environment they’re operating in. This reduces the skill cost of customisation substantially.

After initial access, parsing large volumes of log data, Active Directory output, or network topology information to identify the fastest path to high-value targets is an information processing task where LLMs assist effectively. The operator still makes decisions; the LLM processes the raw input faster than a human analyst would.

LLMs have also demonstrated ability to suggest appropriate living-off-the-land techniques for a given environment when provided with environmental context, advising which built-in tools are less likely to be monitored in a specific configuration. The human-in-the-loop remains present at this stage. The operator is increasingly assisted, not autonomous.

Ransomware Operations: AI at Both Ends

Ransomware operations have adopted AI at the production end and the negotiation end simultaneously. The VDBIR 2026 found a 42% year-over-year increase in ransomware incidents against utilities and critical infrastructure, alongside increased sophistication in initial access quality and negotiation communications.

Ransom demand drafting. Where earlier ransomware groups produced demands with obvious grammatical errors and imprecise legal threats, current groups are producing well-constructed communications that accurately reference specific regulatory obligations the victim faces: GDPR breach notification timelines, sector-specific reporting rules, personal liability implications for executives. The quality shift is measurable and the likely cause is not difficult to identify.

Victim research. AI-assisted analysis of publicly available financial information (company size, sector, insurance market data, recent M&A activity) is being used to calibrate ransom demands more precisely. Demands sized to what the victim can plausibly pay generate better negotiation outcomes than demands based on guesswork.

Negotiation pace. AI automation of communication and analysis tasks has increased the speed and parallelism of negotiations. Groups are running simultaneous negotiations across multiple victims more efficiently than manual operations allowed.

Nation-State Actors: Early and Aggressive Adopters

State actors moved early. The confirmed 2025 cases involved actors from Russia (Fancy Bear, Cozy Bear), China (groups operating under MSS and PLA direction), Iran, and North Korea. Documented uses included:

Spear-phishing drafting: producing targeted content in native-quality language for deployment against foreign targets in languages the actor’s operators may not speak fluently
Vulnerability research assistance: reviewing code and documentation for exploitable flaws
Malware translation: adapting malware components from one language or platform to another
Operational security research: querying for detection avoidance techniques and tradecraft

Whether the May 2026 zero-day case involves a state actor hasn’t been confirmed publicly. But the capability demonstrated is consistent with the direction of travel for both state actors and sophisticated criminal groups. The distinction between those two categories has been blurring for years.

What Defenders Need to Adapt

Update the threat model for phishing. Training built around grammar heuristics, implausible pretexts, and generic templates is defending against last decade’s baseline threat. The new assumption should be: any sufficiently motivated actor can produce highly personalised, contextually accurate, stylistically convincing content. Train employees to treat all unexpected requests for credentials, access, or financial action as suspicious regardless of how convincing they appear, not to spot bad writing.

Phishing-resistant authentication is now a priority, not a recommendation. The expansion of 2FA bypass exploits (not just the May 2026 case but the broader availability of adversary-in-the-middle phishing kits) makes the distinction between phishing-resistant and phishing-susceptible 2FA practically significant. Hardware security keys and passkeys are resistant to AiTM attacks and OTP bypass exploits. TOTP and SMS codes are not. The migration path is clear and the urgency has increased.

Use LLM fingerprints as a detection signal while they’re reliable. Educational docstrings, hallucinated citations, unnaturally consistent code style: these are currently detectable in attacker tooling. Add LLM authorship analysis to incident response workflows when examining attacker code. The signal will degrade as actors learn to suppress it, but it has investigative value right now.

Patch velocity matters more than it did. AI-assisted vulnerability discovery and exploit development compresses the window between disclosure and exploitation. The assumption that vulnerability disclosures come with a grace period for patching should be revisited for internet-exposed infrastructure, authentication systems, and administrative interfaces. In targeted sectors, that window may now be measured in days rather than weeks.

Track AI offensive capability development as a primary threat driver. The current limitations in LLM-assisted lateral movement and autonomous persistence are real but not permanent. DARPA’s AI Cyber Challenge, concluded in 2025, demonstrated that autonomous systems could identify and patch vulnerabilities with meaningful reliability in controlled environments. The transition to operational deployment isn’t complete, but the direction is established. Defenders who treat AI capability development as a secondary consideration are building security programmes against a threat that’s evolving underneath them.

Human expertise and strategic direction remain central to sophisticated campaigns. AI hasn’t made them obsolete. What it has done is compress the skill and time requirements for specific tasks (reconnaissance, phishing authorship, exploit writing for accessible vulnerability classes) in ways that have lowered the barrier to entry for capable attacks and expanded the population of actors who can conduct them.

The May 2026 disclosure is a threshold event in the public record. AI-generated exploit code is now a demonstrated operational tool. The fingerprints it left are useful and temporary. The underlying capability is real.