Research

Human Detection of AI-Generated Phishing

An ongoing study into which phishing techniques humans miss most when AI eliminates linguistic quality as a detection signal. Data is collected through Retro Phish, a game-based research platform where players classify emails, bet confidence, and earn XP while contributing to a real dataset.

Research Question

The dominant heuristics for identifying phishing have historically been linguistic: look for grammar errors, awkward phrasing, unusual idioms, and formatting inconsistencies. These signals held up because real phishing campaigns were written sloppily, often by non-native speakers working at volume. That era is over. AI-generated phishing is grammatically flawless, contextually plausible, and available at negligible marginal cost.

The study question is: when linguistic quality is held constant across all emails, phishing and legitimate alike, which phishing techniques produce the lowest human detection rates?

Technique is the only independent variable. Every card in the dataset, phishing and legitimate, was generated by an AI model. This controls for writing quality and removes it as a confound. What remains are the structural and contextual properties of each technique: how it frames the request, what authority it invokes, what urgency it creates, and whether it establishes a plausible backstory.

Secondary questions: Does professional security background improve detection rates for specific techniques, or does it improve detection uniformly? Does overconfidence correlate with specific technique failures? Do security professionals show lower bypass rates than technical non-security users, or does security experience not predict detection accuracy as well as we assume it should?

Dataset

1,000

Total cards

690

Phishing cards

310

Legitimate cards

6

Techniques studied

Phishing cards (690)

Six techniques, 115 cards each. Each technique block is split into four difficulty tiers to ensure the dataset captures a realistic range of attack sophistication rather than clustering at a single difficulty level.

DifficultyCards per technique
Easy35
Medium35
Hard35
Extreme10

Legitimate cards (310)

Legitimate cards cover three real-world email categories. Including a realistic volume of legitimate emails ensures players cannot gain an edge by defaulting to phishing classifications, and that false positive rates are measurable.

CategoryCards
Transactionalreceipts, shipping, account updates110
Marketingnewsletters, promotions, announcements100
Workplaceinternal comms, HR, IT notices100

The dataset is frozen at v1 once 1,000 approved cards are reached. All cards go through an admin review pipeline before going live: generated in batches via the Claude API, staged for review, then approved or rejected by a human reviewer. Cards are not added to the live dataset without review.

Phishing Techniques Studied

Each technique represents a distinct social engineering mechanism. They were selected because they map to real-world attack patterns documented in threat intelligence reporting, and because they make different cognitive demands on the classifier. Technique is the only independent variable in the study. Everything else: prose quality, email structure, and presentation is held constant.

Urgency

Emails that manufacture artificial time pressure to force fast, unconsidered decisions. These messages typically invoke expiring accounts, unprocessed payments, immediate action requirements, or looming security events. The tell is structural: the email wants you to act before you think. In a controlled quality environment, where the prose is polished and the framing is plausible, urgency becomes harder to isolate as a signal because it also appears in legitimate transactional emails: password resets, shipping updates, calendar reminders.

Authority Impersonation

Messages that impersonate a figure whose instructions carry implicit compliance pressure: executives, IT departments, HR, legal teams, government agencies, or established institutions. The attack exploits deference. Recipients are conditioned to respond to certain names and titles without scrutinising the request itself. In the dataset, all sender names and organisations are plausible rather than obviously spoofed, which removes the low-effort check of looking for misspelled brand names.

Credential Harvest

Classic credential phishing: an email directing the recipient to a login page, verification flow, or account recovery process. These messages are the backbone of most real-world phishing campaigns because they work. The dataset focuses on the email layer, not the destination. Cards present the message itself and reveal forensic signals (SPF/DKIM/DMARC status, reply-to analysis, URL characteristics) after the player classifies it. The goal is to test whether players can detect the phishing intent from the message alone, before they ever click.

Hyper-personalization

Emails that reference contextually plausible personal or professional detail to establish authenticity. These might reference a recent purchase, a shared connection, a project name, an industry, or a role-specific process. The technique exploits the cognitive shortcut of recognising familiar context as a legitimacy signal. Hyper-personalized phishing is expensive to produce at scale with human writers, but AI makes it trivially cheap. This category tests whether the presence of relevant-sounding context meaningfully lowers detection rates.

Pretexting

Multi-step social engineering that establishes a believable backstory before making the ask. The email arrives as part of an implied ongoing interaction: a follow-up to a meeting that may or may not have happened, a response to a request the recipient may or may not remember making, a continuation of a vendor relationship. The pretext does the work. The request itself is often mundane. Detection requires recognising the setup as artificial rather than evaluating the request on its own terms.

Fluent Prose

Phishing with no urgency cues, no authority figure, no personalization, and no pretext. Just polished, neutral email language making a request. This is the hardest category to classify because it removes every conventional heuristic simultaneously. The email reads like a normal business communication. The study hypothesis is that fluent prose phishing will have the highest bypass rate precisely because it offers nothing obvious to flag. If that hypothesis holds, it has significant implications for how security awareness training frames the "what to look for" question.

Methodology

Game modes

The platform has two modes. Freeplay is an open training mode available to anyone without an account. It draws 10 cards per round at random from the full dataset. Freeplay data is not included in the research dataset. Research Mode requires a player account and contributes to the study. It also draws 10 cards per round from the full dataset, with the same random sampling approach, so technique representation balances naturally at scale without artificial deck constraints.

Classification and confidence

For each card, players make two decisions: classification (phishing or legitimate) and confidence level. Confidence is expressed in three tiers:

LevelXP multiplierInterpretation
GUESSINGuncertain classification
LIKELYmoderate confidence
CERTAINhigh confidence

Confidence data is recorded alongside correctness. This allows the study to measure calibration: whether players who report high confidence are actually more accurate, whether overconfidence clusters around specific techniques, and whether security professionals show better calibration than non-security users.

Data collected per answer

Research Mode answers are linked to a pseudonymous player UUID. Email addresses are held only in Supabase Auth and are never stored in research tables. The research tables record:

  • Player UUID (pseudonymous, not linked to email outside auth)
  • Game mode (research mode only)
  • Card technique and correct classification
  • Player answer and confidence level (GUESSING / LIKELY / CERTAIN)
  • Time taken to classify (milliseconds)
  • Session identifiers for grouping answers into rounds

Professional background

Players can optionally self-report their professional background on their profile. This field is used to compare bypass rates across groups and test whether security experience produces meaningfully better detection outcomes. The three options are:

  • INFOSEC / CYBERSECURITYworking in security
  • TECHNICAL / NON-SECURITYtechnical role outside security
  • OTHERgeneral users, students, non-technical roles

Background is optional. Players can decline to specify. Selecting “prefer not to say” excludes their background from the group comparison analysis while still including their answer data in the main dataset.

Forensic signals

After each round, players see a forensic signal breakdown for every card they classified. This serves a dual purpose: it functions as the learning layer of the game, and it trains players on real detection signals rather than just telling them the answer. The signals revealed are:

SPF / DKIM / DMARC

Authentication status for the sending domain. Failures suggest the message did not originate from the domain it claims. In practice, many phishing campaigns use domains that pass basic authentication, so this signal is necessary but not sufficient.

Reply-To Mismatch

Whether the reply-to address differs from the from address. A common technique for harvesting replies without controlling the sending domain. Legitimate bulk email often uses separate reply-to addresses, so this requires contextual interpretation.

Send Timestamp Analysis

The time and timezone offset of the message. Emails sent at unusual hours or from unexpected timezone offsets can indicate automated sending infrastructure or a mismatch between the claimed organisation and the actual sender location.

URL Inspector

Tappable links that reveal destinations. Hovering or tapping a link in a real email is one of the most reliable quick checks available. The game simulates this to train the habit and to show players the gap between displayed anchor text and the actual URL.

Attachment Name Analysis

Where applicable, the filename and extension of attachments. Double extensions, unusual formats for the claimed document type, and names engineered to trigger opens are all represented in the dataset.

Expert Mode

After completing 10 Research Mode sessions, players unlock Expert Mode. Expert Mode draws exclusively from the extreme difficulty tier (10 cards per technique, 60 cards total in the phishing pool) and awards double XP. Extreme cards represent the upper bound of detection difficulty within each technique. Expert Mode data is tracked separately, allowing the study to analyse how detection rates shift when only the hardest examples are presented.

Limitations

Self-selected sample

Participants are players who discovered Retro Phish and opted into Research Mode. This is not a random sample of the general population. Results will over-represent people who are security-aware or curious about phishing, which likely biases detection rates upward compared to a general workforce sample.

Game context

Players know they are classifying emails in a game environment. This may produce different cognitive engagement than real-world email triage, where classification competes with other tasks and attention is not guaranteed. Game context may inflate detection rates by focusing attention on the task.

AI-generated cards

All cards, including legitimate ones, are AI-generated. This produces a controlled dataset but means the legitimate emails do not carry the full contextual richness of real correspondence. In practice, recipient-specific context (knowing the sender, expecting the email, recognising internal references) is a strong legitimacy signal that the dataset cannot replicate.

Self-reported background

Professional background is self-reported and not verified. Players may misclassify their background or select options that do not accurately reflect their day-to-day exposure to security concepts.

Hypotheses

These hypotheses were formed before data collection began and are stated here to distinguish predictions from post-hoc rationalisations once results are available.

Highest bypass rate within the study

Pretexting and Fluent Prose are expected to produce the highest bypass rates in the game context. Both techniques remove or obscure the conventional tells that security awareness training targets. Fluent Prose strips out every surface-level heuristic simultaneously. Pretexting buries the malicious intent inside a plausible narrative. Neither gives the classifier an obvious hook to flag.

Hyper-personalization deserves a separate note here. In real-world deployments, it would likely be the most effective technique of all: an email that references your actual name, role, recent activity, or known colleagues is substantially harder to dismiss than a generic message. But that advantage depends on the email actually being personalized to the individual reader. In this dataset, it is not. Cards labelled hyper-personalization use plausible contextual detail, not detail drawn from the specific player seeing the card. That distinction collapses the core advantage of the technique. A “hyper-personalized” email that does not actually reference anything about you reads more like a plausible-but-generic message than a targeted attack. This is a known limitation of studying personalization in a fixed dataset, and it means the in-study bypass rate for hyper-personalization will likely understate its real-world effectiveness.

Lowest bypass rate

Credential Harvest is expected to be the most detectable technique. It is the attack pattern most consistently covered in security awareness training, and players are conditioned to scrutinise login prompts and link destinations more than any other element of an email. Even in a controlled environment where prose quality is held constant, the structural fingerprint of credential phishing is recognisable: there is always an ask to authenticate somewhere.

Group differences

Security professionals (INFOSEC group) are expected to outperform both technical non-security users and general users in overall detection rate. Daily exposure to threat patterns, incident reports, and phishing simulations should produce better intuition across most technique categories. The more interesting question is whether that advantage is uniform or concentrated: security professionals may show dramatically better detection on some techniques while performing comparably to other groups on techniques that exploit cognitive shortcuts rather than technical knowledge.

Confidence calibration

Players will be overconfident when wrong. Incorrect classifications are expected to skew toward LIKELY and CERTAIN rather than GUESSING, meaning players will not just miss phishing emails but will miss them while feeling sure they are right. This pattern is expected to cluster on techniques that produce the most plausible-looking output: pretexting and fluent prose. If a well-constructed pretext reads like a normal email, the player who misclassifies it has no signal telling them they should be uncertain. That confident wrongness is a meaningful finding in its own right, separate from raw bypass rates.

Status

Data collection ongoing.

Live findings are published at retro-phish.scottaltiparmak.com/intel and update in real time as data comes in. A formal write-up will be submitted for peer consideration as the dataset matures. If you are a researcher interested in collaborating, reviewing methodology, or discussing the data, get in touch.

Participate

Research Mode requires a free account (email OTP, no password). Each session is 10 cards and takes about five minutes. Freeplay is available without an account if you want to try the game before contributing to the study.