Skip to content

AI-Security-Research-Group/LLM-Attacks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 

Repository files navigation

LLM Attacks - Wiki

SN Attack Description
1 Adversarial Examples Crafty manipulations of input data that trick models into making incorrect predictions, potentially leading to harmful decisions.
2 Data Poisoning Malicious data injections into the training set that corrupt the model's performance, causing biased or incorrect behavior.
3 Model Inversion Attacks Inferring the input values used to train the model, exposing sensitive information.
4 Membership Inference Attacks Determining whether specific data points were part of the model's training set, leading to privacy breaches.
5 Query Manipulation Attacks Crafting malicious queries that cause the model to reveal unintended information or behave undesirably.
6 Model Extraction Attacks Reverse-engineering the model by querying it to construct a copy, resulting in intellectual property theft.
7 Transfer Learning Attacks Exploiting vulnerabilities in the transfer learning process to manipulate model performance on new tasks.
8 Federated Learning Attacks Compromising client devices or server-side data in federated learning setups to corrupt the global model or extract sensitive information.
9 Edge AI Attacks Targeting edge devices running AI models to exfiltrate data or manipulate behavior.
10 IoT AI Attacks Attacking IoT devices using AI, potentially leading to data breaches or unauthorized control.
11 Prompt Injection Attacks Manipulating input prompts in conversational AI to bypass safety measures or extract confidential information.
12 Indirect Prompt Injection Exploiting vulnerabilities in systems integrating LLMs to inject malicious prompts indirectly.
13 Model Fairness Attacks Intentionally biasing the model by manipulating input data, affecting fairness and equity.
14 Model Explainability Attacks Designing inputs that make model decisions difficult to interpret, hindering transparency.
15 Robustness Attacks Testing the model's resilience by subjecting it to various perturbations to find weaknesses.
16 Security Attacks Compromising the confidentiality, integrity, or availability of the model and its outputs.
17 Integrity Attacks Tampering with the model's architecture, weights, or biases to alter behavior without authorization.
18 Jailbreaking Attacks Attempting to circumvent the ethical constraints or content filters in an LLM.
19 Training Data Extraction Inferring specific data used to train the model through carefully crafted queries.
20 Synthetic Data Generation Attacks Creating synthetic data designed to mislead or degrade AI model performance.
21 Model Stealing from Cloud Extracting a trained model from a cloud service without direct access.
22 Model Poisoning from Edge Introducing malicious data at edge devices to corrupt model behavior.
23 Model Drift Detection Evasion Evading mechanisms that detect when a model’s performance degrades over time.
24 Adversarial Example Generation with Deep Learning Using advanced techniques to create adversarial examples that deceive the model.
25 Model Reprogramming Repurposing a model for a different task, potentially bypassing security measures.
26 Thermal Side-Channel Attacks Using temperature variations in hardware during model inference to infer sensitive information.
27 Transfer Learning Attacks from Pre-Trained Models Poisoning pre-trained models to influence performance when transferred to new tasks.
28 Model Fairness and Bias Detection Evasion Designing attacks to evade detection mechanisms monitoring fairness and bias.
29 Model Explainability Attack Attacking the model’s interpretability to prevent users from understanding its decision-making process.
30 Deepfake Attacks Creating realistic fake audio or video content to manipulate events or conversations.
31 Cloud-Based Model Replication Replicating trained models in the cloud to develop competing products or gain unauthorized insights.
32 Confidentiality Attacks Extracting sensitive or proprietary information embedded within the model’s parameters.
33 Quantum Attacks on LLMs Using quantum computing to theoretically compromise the security of LLMs or their cryptographic protections.
34 Model Stealing from Cloud with Pre-Trained Models Extracting pre-trained models from the cloud without direct access.
35 Transfer Learning Attacks with Edge Devices Compromising knowledge transferred to edge devices.
36 Adversarial Example Generation with Model Inversion Creating adversarial examples using model inversion techniques.
37 Backdoor Attacks Embedding hidden behaviors within the model triggered by specific inputs.
38 Watermarking Attacks Removing or altering watermarks protecting intellectual property in AI models.
39 Neural Network Trojans Embedding malicious functionalities within the model triggered under certain conditions.
40 Model Black-Box Attacks Exploiting the model using input-output queries without internal knowledge.
41 Model Update Attacks Manipulating the model during its update process to introduce vulnerabilities.
42 Gradient Inversion Attacks Reconstructing training data by exploiting gradients in federated learning.
43 Side-Channel Timing Attacks Inferring model parameters or training data by measuring computation times during inference.

About

Contribute if you come across any new vulnerabilities that are not on this list.

Resources

License

Stars

Watchers

Forks

Releases

No releases published