Deep Dive high Critical InfrastructureFinanceCommunicationsHealthcareTransportLegal & ProfessionalGovernment

AI in the Attack Chain: How Threat Actors Are Using Language Models Operationally

The question of whether artificial intelligence would change offensive cyber operations has spent several years in the realm of informed speculation. It is no longer speculative. Threat actors are using large language models operationally — across multiple stages of the attack chain, at multiple points on the sophistication spectrum, and with measurable effect on the speed, scale, and accessibility of attacks.

This analysis draws on confirmed incidents, intelligence reporting, and technical indicators to map where AI adoption by threat actors stands today, where it is heading, and what the implications are for organisations defending against it.


The Framing Problem

The security industry has not always approached the AI-in-attacks question clearly. Early commentary tended toward two failure modes: either dismissing LLM-assisted attacks as hype because the underlying techniques were not new, or catastrophising with scenarios that treated AI as an autonomous adversarial agent capable of conducting entire campaigns without human direction.

Both framings miss the point. AI does not need to do something categorically novel to be significant. It needs to make existing attack techniques faster, cheaper, or accessible to a broader actor population — and there is now substantial evidence that it is doing all three.

The more useful frame is operational leverage. Where does AI, in its current form, meaningfully reduce the effort, skill, or time required to execute an attack? The answer maps onto specific stages of the attack lifecycle. The implications vary by stage.


Stage 1: Reconnaissance and Target Research

Reconnaissance is information-intensive, often tedious, and historically dependent on human analysts who understand the target environment. LLMs have made this stage substantially more efficient at the lower end of the skill spectrum.

OSINT synthesis. Processing large volumes of publicly available information — LinkedIn profiles, company websites, public filings, news coverage, job postings — to build a coherent picture of an organisation’s structure, technologies, key personnel, and potential entry points is a task LLMs handle well. A threat actor who previously needed significant time and skill to synthesise this into a targeting dossier can now do it in a fraction of the time.

Technology identification from job postings. Job advertisements reliably expose technology stacks. An organisation hiring “senior engineers with Fortinet NSX experience” is advertising its perimeter technology. “Seeking OT/ICS specialists familiar with Siemens S7” reveals the industrial control environment. LLMs can parse hundreds of job postings and extract a technology map of a target organisation with high fidelity.

Social engineering research. Identifying individuals with specific access, understanding reporting structures, and finding the context needed to make a phishing pretext credible is a research task. LLMs accelerate it substantially.

Confirmed use at this stage is documented across multiple threat actor groups. Microsoft’s 2025 Digital Defense Report noted evidence of Russia-linked actors using LLMs for target research and social engineering preparation. OpenAI’s February 2025 disruption report identified five nation-state actors — from China, Iran, Russia, and North Korea — using the service for reconnaissance research before their accounts were terminated.

Defender implication. The primary effect here is speed and scale. The threshold for a well-researched, highly contextualised attack has dropped. The assumption that a credible, organisation-specific spear-phishing campaign requires significant adversary investment in research should be revisited.


Stage 2: Initial Access — Phishing and Social Engineering

This is where the most visible and measurable AI impact has occurred. The quality, volume, and personalisation of phishing attacks have all increased, and LLMs are a significant contributing factor.

The Grammar Heuristic is Dead

For years, security awareness training relied heavily on a practical filter: grammatical and stylistic errors in phishing emails are a warning sign. The filter was imperfect — sophisticated actors always produced clean copy — but it meaningfully caught lower-end campaigns.

LLMs produce grammatically flawless text in any required register. The grammar heuristic is no longer a reliable early indicator. This is not new information, but organisations that have not updated their security awareness programmes to reflect it are training employees against a threat model that no longer accurately describes the threat.

Personalisation at Scale

The more significant development is personalisation. Effective spear-phishing has always required understanding the target’s role, responsibilities, relationships, and current context. An email that references a real colleague, a current project, a recent news event involving the organisation, and lands with appropriate professional tone is far more convincing than a generic “your account requires verification” template.

This level of personalisation was expensive. It required research time and a skilled author. With LLM-assisted drafting and OSINT synthesis, it is no longer expensive. An actor can produce hundreds of highly personalised pretext messages targeting employees across an organisation in the time it previously took to produce a handful.

Reported incidents. Multiple security firms documented significant increases in personalised phishing volume through 2025 and into 2026 that correlate with LLM adoption patterns. The VDBIR 2026 found that phishing-as-initial-access remained the dominant vector but that the average quality of phishing content in successful breaches had increased — a finding consistent with AI-assisted authoring.

Vishing and Real-Time Persona Maintenance

Beyond written phishing, AI voice synthesis is being used in business email compromise-adjacent fraud. Threat actors have used cloned executive voices — produced from publicly available audio, conference recordings, or earnings calls — in vishing attacks targeting finance teams to authorise fraudulent transfers.

This is an area where the technology has moved faster than organisational defences. Most callback verification procedures were designed against the assumption that a caller claiming to be the CFO would sound like the CFO.


Stage 3: Vulnerability Research and Exploit Development

The confirmed first AI-built zero-day, disclosed by Google in May 2026, moved this from demonstrated research capability to operational reality. The context and implications deserve detailed treatment.

What Changed in May 2026

The Google Threat Intelligence Group’s disclosure documented an actor using an LLM to write a functional 2FA bypass for a widely deployed open-source admin tool. The exploit was correct, functional, and had been validated against a live target. The actor planned to use it in a mass exploitation campaign before detection.

This is the first confirmed case in which AI-generated exploit code was incorporated into an active operation. Prior documented cases — including OpenAI’s 2025 disclosures — involved threat actors using LLMs for research and drafting, not for producing working exploit artefacts.

The significance is not that the vulnerability was novel or exceptionally complex. It was a logic flaw in an authentication flow — the kind of vulnerability that a skilled human developer could find and exploit. The significance is that the actor used AI to close the gap between not having that skill and having it.

The Capability Gap Has Narrowed

Historically, the development of functional exploits — particularly for non-trivial vulnerabilities — required specific skills: understanding memory layouts, authentication flows, protocol internals, and the ability to write code that behaved predictably in adversarial conditions. These skills were not uniformly distributed across the threat actor population.

LLMs do not eliminate this constraint entirely. Exploiting a complex kernel vulnerability in a modern operating system still requires deep expertise that current LLMs cannot substitute for. But for a significant class of vulnerabilities — logic flaws, authentication bypasses, injection issues in web applications — the effective skill threshold has dropped.

The LLM fingerprints identified in the May 2026 case are instructive: educational docstrings, a hallucinated CVSS score, unnaturally consistent code style. These are characteristics that point to LLM authorship and may persist as detectable artefacts in attacker tooling — at least until actors learn to suppress them or fine-tune models on human-authored exploit code.

Automated Vulnerability Discovery

Separate from exploit writing, LLMs are being used to assist with vulnerability discovery — reviewing code for patterns associated with flaws, generating fuzzing inputs, and analysing the output of automated scanners. This is an area where the line between offensive and defensive use is thin: the same capabilities that help defenders find flaws before attackers do also help attackers find flaws in targets.

DARPA’s AI Cyber Challenge, which concluded in 2025, demonstrated that autonomous systems could both identify and patch vulnerabilities with meaningful reliability in controlled environments. The transition from controlled demonstration to operational deployment is not yet complete, but the direction is clear.


Stage 4: Lateral Movement and Persistence

This stage remains more human-dependent than earlier stages. Navigating an enterprise network after initial access — understanding the environment, avoiding detection, identifying high-value targets — requires adaptive decision-making and situational awareness that current LLMs do not reliably provide autonomously.

However, LLM assistance is being used in specific ways:

Script generation. Operators are using LLMs to generate custom scripts for lateral movement tasks — modifying off-the-shelf tooling to evade specific detection signatures, adapting PowerShell or Python scripts to the specific environment they are operating in. This reduces the skill required for customisation.

Log analysis and environmental understanding. After gaining initial access, actors need to understand the environment. Parsing large volumes of log data, Active Directory output, or network topology information to identify paths to high-value targets is an information processing task that LLMs assist with.

Living-off-the-land technique selection. LLMs have demonstrated ability to suggest appropriate LOtL techniques for a given environment when provided with environmental context — advising which built-in tools are less likely to be monitored, which processes make good masquerade targets.

The human-in-the-loop remains present at this stage, but the operator is increasingly assisted rather than doing all analytical work independently.


Stage 5: Ransomware Operations — The AI Integration Layer

Ransomware operations have adopted AI at both ends of the campaign. The VDBIR 2026 found a 42% year-over-year increase in ransomware incidents against utilities and critical infrastructure — while noting that the sophistication of initial access and the quality of ransom demand communications had increased.

Ransom demand drafting. Ransom notes and negotiation communications have improved markedly in quality. Where earlier ransomware groups produced demands with obvious grammatical errors and imprecise legal threats, current groups are producing well-constructed communications that reference specific regulatory obligations the victim organisation faces — GDPR breach notification requirements, sector-specific reporting rules — in accurate and credible terms.

Victim research. Ransomware groups are using AI-assisted research to size ransom demands more precisely — reviewing publicly available financial information, company size, sector, and insurance market data to arrive at a demand calibrated to what the victim can plausibly pay.

Speed. AI automation of communication and analysis tasks has increased the pace of negotiations. Groups can run parallel negotiations across multiple victims more efficiently.


What State Actors Are Doing

Nation-state actors were early and aggressive adopters. The confirmed cases from 2025 involved actors from Russia (Fancy Bear, Cozy Bear), China (APT groups operating under MSS and PLA direction), Iran, and North Korea. Documented uses included:

  • Spear-phishing drafting — producing targeted content in native-quality language for deployment against foreign targets
  • Vulnerability research assistance — reviewing code and documentation for exploitable flaws
  • Malware translation — adapting malware components from one language or platform to another
  • Operational security research — querying for detection avoidance techniques and tradecraft

The May 2026 zero-day case may or may not involve a state actor — attribution has not been publicly confirmed. But the capability demonstrated is consistent with the direction of travel for both state and sophisticated criminal groups.


Implications for Defenders

The summary position is not that defenders are losing ground across all fronts. Several of the AI-assisted attack improvements have corresponding defensive responses. But the response requires deliberate adaptation, not incremental adjustment.

Accept That the Threat Model Has Shifted

Security awareness training built around grammar heuristics, implausible pretexts, and generic phishing templates is defending against last decade’s attack. The new baseline assumption should be: any sufficiently motivated actor can produce highly personalised, contextually accurate, stylistically convincing phishing content. Training should shift from “spot the bad email” to “treat all unexpected requests for credentials, access, or financial action as suspicious regardless of how plausible they appear.”

Phishing-Resistant Authentication

The expansion of 2FA bypass exploits — not just the May 2026 case but the broader availability of adversary-in-the-middle phishing kits — makes the distinction between phishing-resistant and phishing-susceptible 2FA practically significant. Hardware security keys and passkeys are resistant to AiTM attacks and bypass exploits that target OTP-based 2FA. TOTP and SMS-based codes are not. The migration to phishing-resistant authentication is a priority that has moved up the urgency scale.

LLM Fingerprinting as a Detection Signal

The characteristics of LLM-authored code — educational docstrings, hallucinated citations, unnaturally consistent style — are currently detectable. Incorporate LLM authorship analysis into incident response workflows when examining attacker tooling. As threat actors learn to suppress these markers, the signal will degrade, but it has short-term investigative value.

Patch Velocity

AI-assisted vulnerability discovery and exploit development increases pressure on patch velocity. The window between vulnerability disclosure and exploitation is not a new problem, but it is a problem with a shorter timeline. Continuous patching of internet-exposed infrastructure — particularly authentication systems and administrative interfaces — is not optional for organisations in targeted sectors.

Monitor for Emerging AI-Enabled Capabilities

The trajectory of AI capability suggests that autonomous agents capable of more sophisticated lateral movement and persistence will arrive. Current limitations in adaptive decision-making are not permanent. Defenders should track the development of AI-assisted offensive capability as a primary threat driver, not a secondary consideration.


Conclusion

AI has not transformed offensive cyber operations overnight. Human expertise, experience, and strategic direction remain central to sophisticated campaigns. What AI has done is compress the skill and time requirements for specific tasks — reconnaissance, phishing authorship, exploit writing for accessible vulnerability classes — in ways that meaningfully lower the barrier to entry for capable attacks.

The May 2026 zero-day disclosure is a threshold event. It confirms that AI-generated exploit code is now an operational tool, not a research curiosity. The fingerprints it left are useful for now, but they are a temporary artefact of actors still learning to use the tools. The underlying capability is real and will be refined.

Defenders who are still asking “will AI change the threat landscape” are a year behind the question. The more useful question now is: which parts of our security posture were calibrated to a threat environment that no longer exists?

Sources

  • Google Threat Intelligence Group — AI-Assisted Exploit Development Advisory (May 2026)
  • Verizon Data Breach Investigations Report 2026
  • Microsoft Digital Defense Report 2025
  • NCSC — The Near-Term Impact of AI on the Cyber Threat
  • OpenAI — Disrupting Malicious Uses of AI (February 2025)