Framework

Derived from Threat Terminal

The Action Pause

From the Threat Terminal study

A ten-second habit for anything that asks you to act. Email, message, phone call, video, voice, or in-person. A user-layer defense for the AI-era residual class that reaches you cleanly through whichever channel the attacker has learned to pass.

By Scott Altiparmak · April 2026 · Every element of this framework is a response to a specific pattern measured across 2,511 email classifications and 153 participants in the Threat Terminal study. Cross-channel extension is by theoretical argument from the same dataset.

Download PDF Empirical findings (DOI)→Read the essay

60.5%

of phishing misses occurred at the highest confidence level

Overconfidence, not guessing, is the dominant pattern in the data.

86.8% → 76.0%

Phase 1: detection drop when phishing displayed passing authentication

Clean SPF/DKIM did not protect users. It quieted them. Auth-header display was removed in Phase 2 as a protocol revision; Phase 1 stands as the evidence behind the clean-is-not-safe rule.

20.5%

miss rate on authority impersonation

The highest-miss technique, and barely addressed by standard training.

12.5%

miss rate on credential harvest

The category most curricula drill. Users already detect it reliably.

The problem this addresses

Modern technical controls handle most impersonation at the layer they can reach. Email gateways and native filtering from Microsoft and Google apply a combination of detection tools and methods layered on top of authentication standards (SPF, DKIM, DMARC), and equivalent controls exist on other channels at the carrier and platform layer. Together they discard the sloppy stuff. What reaches the user, by selection, is the residual class: clean authentication, clean infrastructure, fluent prose, voices that sound right, faces that look right, and well-targeted context. This is where real breaches originate, and the channel the request arrives on is increasingly beside the point.

Existing user training was built against pre-AI phishing with detectable indicators: misspellings, awkward tone, suspicious sender names. Those indicators do not exist in the residual class, and they never existed for voice and video impersonation in the first place. In Phase 1 of the Threat Terminal study, detection of phishing that displayed passing authentication dropped from 86.8% to 76.0% relative to phishing with failing authentication. Clean signal did not make users safer. It quieted them. Auth-header display was removed in Phase 2 as a protocol revision, but the Phase 1 finding stands as direct evidence that the controls users were taught to trust can work against them once attackers have learned to pass them.

The Action Pause is designed for that residual class across any channel it arrives on: email, message, phone, video call, voice note, document, or face-to-face. It does not replace filters. It addresses the specific gap that filters cannot close and indicator-based training does not touch: a well-crafted ask that reaches the person cleanly.

Origin in the Threat Terminal data

This framework is not a rebrand of general awareness advice. Each component is a direct response to a specific pattern measured in the Threat Terminal study (153 participants, 2,511 binary classifications, 1,000 AI-generated email cards, March 2026).

The trigger is structural because content cues fail at the residual class.

Phase 1 finding: detection dropped from 86.8% to 76.0% when phishing displayed passing SPF/DKIM. Technical signals users were taught to read made them quieter, not safer. If the cues can be cleaned, the trigger cannot depend on them.

The three questions are weighted to the measured miss rates.

Authority impersonation (20.5%), urgency (16.8%), pretexting (16.3%), fluent prose (16.2%), and hyper-personalization (14.5%) are the techniques users miss most. The questions were selected and ordered to fire directly on those miss categories, not to fit a pre-existing mnemonic.

The calibration rule exists because 60.5% of phishing misses happened at the highest confidence level.

Overconfidence, not uncertainty, is the dominant pattern in the dataset. The rule operationalizes that finding into a behavior: unverified certainty on a consequential action is itself a warning sign.

The trigger-must-be-reflexive principle is the shape the learning curves in the data already show.

Within a single session, accuracy rose from 79.3% on the first card to 89.3% by the fifth. That is not knowledge acquisition. It is a trigger getting faster with exposure. The curriculum principles below are designed to reproduce that curve deliberately.

Full methodology, per-technique breakdowns, and confidence distributions: doi.org/10.5281/zenodo.19410549.

The framework

Ten seconds. Every time.

The Action Pause

Trigger

“This is asking me to do something.”

Any channel: email, chat, phone call, voicemail, video call, voice note, document, or in-person ask.

Any action: Click · Sign in · Approve · Pay · Wire · Forward · Reply with information · Install · Scan a QR code · Change an account · Grant access · Read something aloud.

Ask three questions

01
Did I expect this?
02
Is this the normal channel for this request?
03
If it is fake, what breaks?

Calibrate honestly

If you are not certain, or certain but unverified on anything consequential, verify through a separate channel before you act.

Standing rule

The incoming channel is never sufficient evidence.

For any consequential action, the message, call, video, or document in front of you is never sufficient evidence on its own. Verify through a separate, known channel: a known phone number, a known portal, an in-person conversation, or a tool the attacker cannot impersonate.

What counts as consequential

The calibration rule and the standing rule both hinge on whether an action is consequential. For enterprise rollout, the boundary needs to be concrete. An action is consequential if a reasonable person would find it hard to undo, hard to explain, or expensive to recover from if the request turns out to be fake.

Treat these as consequential by default, and require out-of-band verification before acting:

Moving money, changing payment instructions, or approving a wire.
Sharing credentials, MFA codes, or session tokens.
Granting access to a system, mailbox, repository, or data set.
Changing account ownership, recovery methods, or contact information.
Forwarding sensitive data (personal, financial, regulated, or privileged).
Installing software, running a script, or connecting a device.
Making a public statement, approval, or disclosure on behalf of someone else.

Low-stakes actions (acknowledging a meeting, reading a document, replying to a social note) do not require the standing rule, though the trigger and questions still apply if the ask feels off. The threshold is explicit and public on purpose: users should not have to relitigate it in the moment.

Why the trigger must be reflexive

Phishing does not usually succeed because the user missed a signal. It succeeds because the user noticed something was off but had already acted by the time the noticing caught up. “I knew it was weird the second I hit the button” is the modal account of a real-world phishing loss. The signal arrives. The reflection never fires in time to stop the action.

This is a cognitive property of a well-crafted phish, not a moral failing of the target. The message keeps the recipient in the action loop (reading, recognizing, responding) without surfacing the reflection that would interrupt it.

The Action Pause only works if its trigger fires before the action. “Pause when you notice an ask” helps only if noticing an ask is itself reflexive, surfacing automatically rather than requiring the user to remember a framework in the moment. The three questions and the calibration rule are deliberate interventions. They only work if a reflexive trigger hands them the moment.

That is what good training builds: a reflex that fires on “this is asking me to act.” The Threat Terminal data shows this shape directly. Within a single session, accuracy rose from 79.3% on the first card to 89.3% by the fifth. That is not knowledge acquisition in five minutes. It is a trigger getting faster. Annual awareness modules do not produce this pattern. Short, frequent exposures do, which is why the curriculum principle below is load-bearing rather than stylistic.

Why each question is in the framework

Each question maps to a measured miss pattern. The question set is not a generic mnemonic; it is weighted to the categories where the data shows users actually miss.

Q01

Did I expect this?

Addresses: Hyper-personalization (14.5%), fluent prose (16.2%)

AI-generated content reads fluently and targets your role. The defense is not better reading. It is knowing whether this request was expected in the first place.

Q02

Is this the normal channel for this request?

Addresses: Authority impersonation (20.5%), pretexting (16.3%)

Impersonation works by mimicking a plausible sender. Knowing what plausible actually looks like, how this person or system usually reaches you, is the defense.

Q03

If it is fake, what breaks?

Addresses: Urgency (16.8%), and every action with real blast radius

Urgency moves decisions out of the consequence frame. This question pulls them back. Money, credentials, access, or data change the verification bar.

CAL

Calibration rule

Addresses: 60.5% of misses at highest confidence

The dominant pattern in the data is not uncertainty that went wrong. It is certainty that went wrong. The calibration rule does not ask users to be more skeptical in the abstract. It asks them to treat unverified certainty, on consequential actions, as a warning sign about themselves.

What this does and does not yet prove

The empirical calibration is email-specific. The Threat Terminal study measured 153 participants making 2,511 binary classifications against 1,000 email cards generated by Claude-family language models under controlled conditions. The per-technique miss rates, the 60.5% overconfidence figure, and the Phase 1 authentication finding come from that dataset.

The framework's extension to voice, video, and in-person asks is by theoretical argument, not by measurement. The trigger is structural, so it should apply wherever a request to act arrives. That is a load-bearing claim and deserves to be labelled as one. It is directionally supported by the pattern already established in voice and video deepfake incidents reported by the FBI, the FTC, and CISA, but it has not been measured in a controlled cross-channel dataset.

The curriculum principles (teach the trigger first, weight toward the content-dependent techniques, prefer short and frequent exposures, surface overconfident misses) are inferences from the email study and enterprise pilot design notes. Field-validated outcomes, including whether this framework reduces real-world impersonation losses at organizational scale, are the subject of planned pilot work and are not yet in hand.

How this differs from prior frameworks

The Action Pause is a synthesis, not a from-scratch invention. Pieces of it appear in prior work. Texas A&M's “Think Before You Click” asks whether a message was expected, whether it is a normal pattern, and whether money is involved. FBI/IC3 and CISA guidance already recommend out-of-band verification for account-change and financial requests. Canfield, Fischhoff, and Davis (2016) measured overconfidence in phishing judgment. What this framework contributes is a specific combination of those elements, packaged into a single ten-second behavioral loop, and extended to any channel:

A structural trigger that fires on the request to act, not on content cues or on a specific delivery channel. This is the move that matters once AI has flattened content cues across email, voice, and video.
An explicit calibration rule that treats unverified certainty on a consequential action as a warning sign, operationalizing a descriptive finding from prior confidence-calibration research into a prescriptive behavior at the point of action.
Per-technique empirical calibration against measured miss rates from a dataset of 2,511 email classifications, so the question set is weighted to where users actually fail rather than to curriculum preference.
A deliberately channel- agnostic scope that treats the AI-era residual class as a single behavioral problem (impersonation passing whatever controls exist for its channel), rather than as separate email, voice, or video advisories.

Framework	Trigger	Calibration	Scope
Stop. Think. Connect. NCSA / DHS, 2010	Any online action or suspicious content	Implicit	General online safety, email-centric
"Think Before You Click" checklist Texas A&M IT awareness	Receiving an email	Not addressed	Email phishing
NIST SP 800-50 / SP 800-16 NIST awareness guidance	Curriculum specification	Not addressed	Enterprise awareness programs
PhishGuru embedded training Kumaraguru et al., CMU	Post-click in simulation	Not addressed	Email simulation-tethered intervention
Indicator-based heuristics Vendor and SANS curricula	Email content cues (URLs, headers, grammar)	Not addressed	Pre-AI email phishing
FBI / IC3 impersonation guidance FBI public service announcements	Specific attack types (BEC, deepfake voice, etc.)	Out-of-band verification recommended	Channel-specific advisories
The Action Pause This work	Request structure (any action ask, any channel)	Explicit rule: unverified certainty = warning	AI-era residual impersonation across channels

Earlier frameworks remain useful for the attacks they were designed against. The Action Pause is specifically for the residual class that passes modern filters. Content-level heuristics no longer isolate that class, which is why a structural trigger is required.

For practitioners: curriculum principles

Four principles for building awareness programs around the Action Pause. Each is tied to a specific pattern in the Threat Terminal data, not to pedagogical preference.

Teach the trigger first.

Most misses happen because the user did not notice they were about to act. Recognizing an action request is the prerequisite skill. The questions come second and are easier to teach once the trigger is reflexive.

Reweight toward the high-miss categories.

Shift curriculum time from credential harvest drills (12.5% miss, already solved by training) toward authority impersonation (20.5%), urgency (16.8%), pretexting (16.3%), and fluent prose (16.2%). These are where detection actually fails.

Short and frequent, not annual and long.

Session-level accuracy in the study rose from 80.2% (Session 1) to 88.6% (Session 3). Within-session, first-card accuracy was 79.3% versus 89.3% by the fifth. Iterative exposure does real work. Sixty-minute annual modules do not replicate this pattern.

Surface overconfident misses by name.

The highest-leverage feedback event is not "you got one wrong." It is "you were certain, and you were wrong." Name the miscalibration directly. The 60.5% overconfidence rate is the behavior this feedback pattern targets.

Pilot availability

The Action Pause is designed for enterprise validation. I am preparing to run it through a gamified training deployment built on the Threat Terminal research instrument, with the same measurement discipline applied to training outcomes: per-technique detection rates, confidence calibration, and out-of-band verification behavior.

Organizations interested in piloting the framework, or researchers interested in the dataset and methodology, are welcome to reach out.

scott@scottaltiparmak.com Threat Terminal →

How to cite

Altiparmak, S. (2026). The Action Pause: a post-filter, user-layer defense against AI-generated phishing. Derived from the Threat Terminal study (doi:10.5281/zenodo.19410549). scottaltiparmak.com/research/action-pause

EMPIRICAL BASISdoi.org/10.5281/zenodo.19410549(preliminary empirical findings)

STUDY PROTOCOLdoi.org/10.5281/zenodo.19059296

LICENSECC BY 4.0

The Action Pause builds on prior work including SANS / NCSA Stop. Think. Connect., NIST SP 800-50 / 800-16 awareness guidance, Kumaraguru et al.'s PhishGuru embedded training (Carnegie Mellon), Wash's research on expert detection heuristics, and Canfield, Fischhoff, and Davis's work on phishing confidence calibration. The novel contribution of this framework is the structural trigger and the explicit calibration rule, both calibrated against measured miss rates in a controlled dataset of AI-generated stimuli.