SN | Attack | Description |
---|---|---|
1 | Adversarial Examples | Crafty manipulations of input data that trick models into making incorrect predictions, potentially leading to harmful decisions. |
2 | Data Poisoning | Malicious data injections into the training set that corrupt the model's performance, causing biased or incorrect behavior. |
3 | Model Inversion Attacks | Inferring the input values used to train the model, exposing sensitive information. |
4 | Membership Inference Attacks | Determining whether specific data points were part of the model's training set, leading to privacy breaches. |
5 | Query Manipulation Attacks | Crafting malicious queries that cause the model to reveal unintended information or behave undesirably. |
6 | Model Extraction Attacks | Reverse-engineering the model by querying it to construct a copy, resulting in intellectual property theft. |
7 | Transfer Learning Attacks | Exploiting vulnerabilities in the transfer learning process to manipulate model performance on new tasks. |
8 | Federated Learning Attacks | Compromising client devices or server-side data in federated learning setups to corrupt the global model or extract sensitive information. |
9 | Edge AI Attacks | Targeting edge devices running AI models to exfiltrate data or manipulate behavior. |
10 | IoT AI Attacks | Attacking IoT devices using AI, potentially leading to data breaches or unauthorized control. |
11 | Prompt Injection Attacks | Manipulating input prompts in conversational AI to bypass safety measures or extract confidential information. |
12 | Indirect Prompt Injection | Exploiting vulnerabilities in systems integrating LLMs to inject malicious prompts indirectly. |
13 | Model Fairness Attacks | Intentionally biasing the model by manipulating input data, affecting fairness and equity. |
14 | Model Explainability Attacks | Designing inputs that make model decisions difficult to interpret, hindering transparency. |
15 | Robustness Attacks | Testing the model's resilience by subjecting it to various perturbations to find weaknesses. |
16 | Security Attacks | Compromising the confidentiality, integrity, or availability of the model and its outputs. |
17 | Integrity Attacks | Tampering with the model's architecture, weights, or biases to alter behavior without authorization. |
18 | Jailbreaking Attacks | Attempting to circumvent the ethical constraints or content filters in an LLM. |
19 | Training Data Extraction | Inferring specific data used to train the model through carefully crafted queries. |
20 | Synthetic Data Generation Attacks | Creating synthetic data designed to mislead or degrade AI model performance. |
21 | Model Stealing from Cloud | Extracting a trained model from a cloud service without direct access. |
22 | Model Poisoning from Edge | Introducing malicious data at edge devices to corrupt model behavior. |
23 | Model Drift Detection Evasion | Evading mechanisms that detect when a model’s performance degrades over time. |
24 | Adversarial Example Generation with Deep Learning | Using advanced techniques to create adversarial examples that deceive the model. |
25 | Model Reprogramming | Repurposing a model for a different task, potentially bypassing security measures. |
26 | Thermal Side-Channel Attacks | Using temperature variations in hardware during model inference to infer sensitive information. |
27 | Transfer Learning Attacks from Pre-Trained Models | Poisoning pre-trained models to influence performance when transferred to new tasks. |
28 | Model Fairness and Bias Detection Evasion | Designing attacks to evade detection mechanisms monitoring fairness and bias. |
29 | Model Explainability Attack | Attacking the model’s interpretability to prevent users from understanding its decision-making process. |
30 | Deepfake Attacks | Creating realistic fake audio or video content to manipulate events or conversations. |
31 | Cloud-Based Model Replication | Replicating trained models in the cloud to develop competing products or gain unauthorized insights. |
32 | Confidentiality Attacks | Extracting sensitive or proprietary information embedded within the model’s parameters. |
33 | Quantum Attacks on LLMs | Using quantum computing to theoretically compromise the security of LLMs or their cryptographic protections. |
34 | Model Stealing from Cloud with Pre-Trained Models | Extracting pre-trained models from the cloud without direct access. |
35 | Transfer Learning Attacks with Edge Devices | Compromising knowledge transferred to edge devices. |
36 | Adversarial Example Generation with Model Inversion | Creating adversarial examples using model inversion techniques. |
37 | Backdoor Attacks | Embedding hidden behaviors within the model triggered by specific inputs. |
38 | Watermarking Attacks | Removing or altering watermarks protecting intellectual property in AI models. |
39 | Neural Network Trojans | Embedding malicious functionalities within the model triggered under certain conditions. |
40 | Model Black-Box Attacks | Exploiting the model using input-output queries without internal knowledge. |
41 | Model Update Attacks | Manipulating the model during its update process to introduce vulnerabilities. |
42 | Gradient Inversion Attacks | Reconstructing training data by exploiting gradients in federated learning. |
43 | Side-Channel Timing Attacks | Inferring model parameters or training data by measuring computation times during inference. |
-
Notifications
You must be signed in to change notification settings - Fork 0
Contribute if you come across any new vulnerabilities that are not on this list.
License
AI-Security-Research-Group/LLM-Attacks
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
Contribute if you come across any new vulnerabilities that are not on this list.
Resources
License
Stars
Watchers
Forks
Releases
No releases published