Grey Hat AI - Source Excerpt 02 - Case Studies
Summary
This source excerpt begins near Case Studies and preserves the surrounding evidence from 2IA.org/agent-file-handoff/Archive/2026-05-16-home-psychological-warfare-improvement/Improvement/Grey-hat AI.md.
**Source path:** 2IA.org/agent-file-handoff/Archive/2026-05-16-home-psychological-warfare-improvement/Improvement/Grey-hat AI.md
- **Data-driven attacks (adversarial ML):** Attackers inject malicious data into training or input streams. For example, *data poisoning* introduces backdoors into models (e.g. embedded triggers)【48†L68-L76】. In grey-hat scenarios, researchers might probe models by inputting specially crafted text/images (adversarial examples) to see hidden behaviors (e.g. “prompt injection” to bypass filters). NIST’s taxonomy explicitly includes poisoning and fine-tuning circumvention as “misuse” attacks【48†L68-L76】.
- **Exploitation via generative models:** Large language models (LLMs) and other generative AI can automate attack planning. Studies found that ChatGPT-4, when given CVE descriptions, can write exploits correctly in ~87% of tests【65†L94-L102】. OpenAI and Google have observed criminals using LLMs to develop zero-day exploits【23†L89-L97】. Grey-hat use here could mean security researchers using the same tools to accelerate vulnerability discovery, without malicious intent.
- **Deep synthesis (GANs, diffusion models):** Generative Adversarial Networks (GANs) and other image/video synthesis frameworks create realistic fakes. In grey-hat contexts, tools like Stable Diffusion or facial-swapping algorithms might be used to expose biases or security flaws, but they can also inadvertently violate privacy or copyright. GANs underpin deepfakes (face/voice cloning)【45†L110-L118】.
- **LLM fine-tuning and chaining:** Autonomous AI agents (Auto-GPT style) can chain tasks (e.g. OSINT gathering, vulnerability scanning, social engineering) with minimal oversight. Architectural choices like API-based LLMs (ChatGPT, Claude, etc.) vs. open models (Llama, Mistral) affect control and auditing. Grey-hat actors often exploit misconfigurations in LLM deployments (as the GreyNoise case showed), or use open-source code models (e.g. Copilot) to generate malicious scripts.
- **Security-focused AI (defensive):** Conversely, white-hat tools use ML for intrusion detection, anomaly spotting, and even automated patch synthesis. Hybrid solutions (AI + human analyst) are common. For example, Palo Alto Networks used an AI “scanning harness” with LLMs to find 26 new CVEs in one cycle【23†L89-L97】. These architectures combine *knowledge graphs*, *LLM query engines*, and *automated exploit verification*.
- **Cryptography and privacy tech:** Some grey-hat uses involve cryptanalysis or privacy attacks. Homomorphic encryption, federated learning, or secure enclaves aim to protect AI data; grey-hat breaches may try to exploit side channels. (No prominent case found, but research on “AI laundering” and extraction attacks is growing.)
A useful taxonomy (see below) distinguishes levels of AI use:
' ' ' mermaid
flowchart LR
subgraph Attacker Actions
A1(Train on unauthorized data)
A2(Adversarial input/prompt injection)
A3(Automated exploit generation)
A4(Deepfake/identity mimicry)
A5(Model theft or evasion)
end
subgraph Affected Systems
S1(ML Model/Service)
S2(API endpoints, proxies)
S3(Data repositories)
S4(Users/Endpoints)
end
subgraph Outcomes
O1(Compromised service)
O2(Data exfiltration/leak)
O3(Harmful output e.g. fraud, defamation)
end
A1 --> S1
A2 --> S1
A3 --> S1
A4 --> S2
S1 --> O1
S1 --> O3
S2 --> O2
S2 --> O3
S1 --> S2
S3 --> S1
S3 --> A1
' ' '
*Threat model (conceptual): AI systems can be attacked via contaminated data, deceptive inputs, or misuse of generative capabilities. Effects range from corrupted models and data breaches to harmful AI outputs【48†L68-L76】【63†L39-L47】.*
## Case Studies
**1. Deepfake CEO/CFO Fraud (2024)**
- *Timeline:* Early 2024 (reported May)【45†L110-L118】.
- *Actors:* Organized fraudsters using AI vs. corporate finance staff. Victim: Arup (international engineering firm).
- *Techniques:* Criminals created real-time deepfake voices and video. A Hong Kong Arup employee joined a Zoom call with synthetic avatars of the firm’s CFO and other staff. The deepfake voices instructed urgent wire transfers. Audio synthesis was likely performed by cloning public recordings; video avatars matched those voices. This combined social engineering with AI-driven impersonation.
- *Outcome:* ~$25.6 million USD (200M HKD) was transferred over 15 transactions before staff noticed fraud【45†L110-L118】. Arup’s systems were not breached (only human deception). The case triggered an investigation by Hong Kong authorities; police still investigating (May 2024 report)【45†L110-L118】. It raised alarms in corporate security: AI-enabled “Business Email Compromise” spiked dramatically (voice phishing up 442% in 2024 per CrowdStrike).
- *Analysis:* This is a **harmful** black-hat use of generative AI. It demonstrates how easy access to voice/video synthesis can bypass traditional email protections, leveraging trust and authority. No legal ambiguity: likely prosecutable as fraud.
**2. LLM Model Endpoint Scanning (Oct 2025–Jan 2026)**
- *Timeline:* Ongoing campaigns detected late 2025 into early 2026【40†L176-L185】. GreyNoise blog (Jan 2026) reported two AI-targeting campaigns spanning Oct 2025–Jan 2026【40†L176-L185】.
- *Actors:* Unknown (likely two groups). GreyNoise suspects **security researchers/bug hunters** in one campaign and **professional attackers** in the other【41†L215-L218】【41†L264-L270】. The infrastructure spanned many IPs; some had histories of CVE scans.
- *Techniques:* The first campaign (Oct 2025–Jan 2026) exploited server-side request forgery (SSRF) in AI services. Attackers injected malicious URLs via Ollama’s “model pull” function and Twilio SMS webhooks, causing servers to “phone home” to attacker-run infrastructure. They used ProjectDiscovery’s OAST (standard red-team tool) to confirm SSRF callbacks【41†L193-L202】【41†L205-L213】. The second campaign (Dec 28, 2025 onward) was an enumeration sweep: two IPs ran ~80,000 LLM API probes across 73 endpoints (OpenAI, Anthropic, Meta, Google, etc.), using benign queries (empty or simple questions) to fingerprint models【41†L223-L232】【41†L245-L253】.
- *Outcome:* GreyNoise captured 91,403 suspicious sessions【40†L176-L185】. No exploit beyond scanning was observed, but the probes clearly mapped exposed AI endpoints. GreyNoise warned that mapping is costly and indicates future exploitation plans【41†L326-L330】. They recommended mitigation: restricting model access, egress filtering, rate-limiting, and monitoring known OAST callback domains【41†L313-L322】【41†L328-L330】.
- *Analysis:* This case straddles grey/black hat. The SSRF campaign used legitimate security tools; GreyNoise assessed it as **“probably security researchers or bug bounty hunters” (grey-hat)**【41†L215-L218】. The enumeration campaign appears **malicious**, building lists for exploitation【41†L264-L270】. It shows how AI service misconfigurations open novel attack surfaces.
**3. AI-Accelerated Vulnerability Discovery (May 2026)**
- *Timeline:* May 15, 2026.
- *Actors:* Security research team (in California), Apple (target).
- *Techniques:* Researchers used Anthropic’s “Mythos” AI model to automatically analyze macOS. They chained two previously unknown kernel bugs in Apple’s new M5 chip (released 2024) to build an exploit. Specifically, they bypassed Apple’s Memory Integrity Enforcement (MIE) – a security feature introduced in Sep 2025 – achieving a root shell from user space in ~5 days【63†L39-L47】. The process involved iterative prompts: Mythos suggested ways to trigger memory corruption, which researchers validated and refined into an exploit.
- *Outcome:* A working macOS 26.4.1 exploit was produced. Apple was informed and began patch development【63†L53-L57】. No public harm occurred (researchers responsibly disclosed).
- *Analysis:* This is a **benign/white-hat** use of AI. It demonstrates AI’s power to uncover subtle flaws even in hardened systems. The architecture combined LLM reasoning with manual validation. It foreshadows AI’s dual role: helpful for security professionals, and potentially for attackers.