LLM Attacks - Wiki

SN	Attack	Description
1	Adversarial Examples	Crafty manipulations of input data that trick models into making incorrect predictions, potentially leading to harmful decisions.
2	Data Poisoning	Malicious data injections into the training set that corrupt the model's performance, causing biased or incorrect behavior.
3	Model Inversion Attacks	Inferring the input values used to train the model, exposing sensitive information.
4	Membership Inference Attacks	Determining whether specific data points were part of the model's training set, leading to privacy breaches.
5	Query Manipulation Attacks	Crafting malicious queries that cause the model to reveal unintended information or behave undesirably.
6	Model Extraction Attacks	Reverse-engineering the model by querying it to construct a copy, resulting in intellectual property theft.
7	Transfer Learning Attacks	Exploiting vulnerabilities in the transfer learning process to manipulate model performance on new tasks.
8	Federated Learning Attacks	Compromising client devices or server-side data in federated learning setups to corrupt the global model or extract sensitive information.
9	Edge AI Attacks	Targeting edge devices running AI models to exfiltrate data or manipulate behavior.
10	IoT AI Attacks	Attacking IoT devices using AI, potentially leading to data breaches or unauthorized control.
11	Prompt Injection Attacks	Manipulating input prompts in conversational AI to bypass safety measures or extract confidential information.
12	Indirect Prompt Injection	Exploiting vulnerabilities in systems integrating LLMs to inject malicious prompts indirectly.
13	Model Fairness Attacks	Intentionally biasing the model by manipulating input data, affecting fairness and equity.
14	Model Explainability Attacks	Designing inputs that make model decisions difficult to interpret, hindering transparency.
15	Robustness Attacks	Testing the model's resilience by subjecting it to various perturbations to find weaknesses.
16	Security Attacks	Compromising the confidentiality, integrity, or availability of the model and its outputs.
17	Integrity Attacks	Tampering with the model's architecture, weights, or biases to alter behavior without authorization.
18	Jailbreaking Attacks	Attempting to circumvent the ethical constraints or content filters in an LLM.
19	Training Data Extraction	Inferring specific data used to train the model through carefully crafted queries.
20	Synthetic Data Generation Attacks	Creating synthetic data designed to mislead or degrade AI model performance.
21	Model Stealing from Cloud	Extracting a trained model from a cloud service without direct access.
22	Model Poisoning from Edge	Introducing malicious data at edge devices to corrupt model behavior.
23	Model Drift Detection Evasion	Evading mechanisms that detect when a model’s performance degrades over time.
24	Adversarial Example Generation with Deep Learning	Using advanced techniques to create adversarial examples that deceive the model.
25	Model Reprogramming	Repurposing a model for a different task, potentially bypassing security measures.
26	Thermal Side-Channel Attacks	Using temperature variations in hardware during model inference to infer sensitive information.
27	Transfer Learning Attacks from Pre-Trained Models	Poisoning pre-trained models to influence performance when transferred to new tasks.
28	Model Fairness and Bias Detection Evasion	Designing attacks to evade detection mechanisms monitoring fairness and bias.
29	Model Explainability Attack	Attacking the model’s interpretability to prevent users from understanding its decision-making process.
30	Deepfake Attacks	Creating realistic fake audio or video content to manipulate events or conversations.
31	Cloud-Based Model Replication	Replicating trained models in the cloud to develop competing products or gain unauthorized insights.
32	Confidentiality Attacks	Extracting sensitive or proprietary information embedded within the model’s parameters.
33	Quantum Attacks on LLMs	Using quantum computing to theoretically compromise the security of LLMs or their cryptographic protections.
34	Model Stealing from Cloud with Pre-Trained Models	Extracting pre-trained models from the cloud without direct access.
35	Transfer Learning Attacks with Edge Devices	Compromising knowledge transferred to edge devices.
36	Adversarial Example Generation with Model Inversion	Creating adversarial examples using model inversion techniques.
37	Backdoor Attacks	Embedding hidden behaviors within the model triggered by specific inputs.
38	Watermarking Attacks	Removing or altering watermarks protecting intellectual property in AI models.
39	Neural Network Trojans	Embedding malicious functionalities within the model triggered under certain conditions.
40	Model Black-Box Attacks	Exploiting the model using input-output queries without internal knowledge.
41	Model Update Attacks	Manipulating the model during its update process to introduce vulnerabilities.
42	Gradient Inversion Attacks	Reconstructing training data by exploiting gradients in federated learning.
43	Side-Channel Timing Attacks	Inferring model parameters or training data by measuring computation times during inference.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
attacks_list		attacks_list
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM Attacks - Wiki

About

Uh oh!

Releases

License

AI-Security-Research-Group/LLM-Attacks

Folders and files

Latest commit

History

Repository files navigation

LLM Attacks - Wiki

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases