ML Security Risks: Understanding OWASP’s Top 10 Risks

jon tyson FlHdnPO6dlw unsplash

Reading time: 6 minutes

The OWASP Machine Learning Security Top 10 is a comprehensive guide developed by the OWASP foundation to address severe vulnerabilities in machine learning models and systems. Organizations can protect their ML Security Risks against attacks that occur in the various stages of the ML lifecycle to ensure robust and reliable ML applications.

With the emergence of AI since OpenAI’s GPT launch, OWASP accelerated its research on potential vulnerabilities in machine learning and LLMs

Input Manipulation Attack

Imagine a facial recognition system tricked into misidentifying someone by adding a pair of glasses to their image. This is an input manipulation attack. Attackers craft malicious inputs designed to exploit the model’s weaknesses, leading to misclassification or compromising the entire system. In 2017, researchers demonstrated how adding small, adversarial noise to images could fool a deep learning model into misclassifying a panda as a gibbon https://nicholas.carlini.com/writing/2019/all-adversarial-example-papers.html. This example highlights the vulnerability of image recognition systems to crafted inputs.

Example Attack Scenario for Input Manipulation Attack

Let’s say a deep learning model is trained to classify images into various categories of dogs and cats. An attacker can alter or modify a few pixels of an image of a cat, appearing to be the original. Such small changes can easily cause the model to misclassify the image incorrectly as a dog. This can cause the manipulated image to bypass security measures or harm the system.

In 2023, researchers experimented with placing stickers on road signs to mislead the Tesla’s autopilot system. It caused the cars to misinterpret stop signs as speed limit signs, leading to incorrect model behavior.

image

Data Poisoning Attack

Data is the fuel for machine learning. Malicious actors can tamper with training data, causing the model to learn biased or harmful patterns. For instance, poisoning a sentiment analysis model with negative reviews could skew its sentiment towards negativity. Alternatively, imagine attackers injecting spam emails labeled as “not spam” into a spam filter’s training data. Over time, the filter might learn to misclassify actual spam emails, allowing them to bypass detection. This scenario exemplifies data poisoning attacks.

Example Attack Scenario for Data Poisoning Attack

The training data for the network traffic classification system in the machine learning model is poisoned through incorrect labeling of various types of traffic. As a result, the model misallocates network traffic to the wrong categories or network resources.

Model Inversion Attack

Imagine training a model to predict credit card fraud. An attacker might try to reverse-engineer the model to extract sensitive information like actual credit card numbers from the model’s outputs. This is model inversion, where the attacker uses the model’s behavior to learn the training data. A research study showed the possibility of reconstructing sensitive information like a patient’s diagnosis from a medical diagnosis model’s outputs https://arxiv.org/abs/1610.05820. This demonstrates the potential risk of model inversion attacks in healthcare.

Example Attack Scenario for Model Inversion Attack

Attackers can train a facial recognition model and use it to invert the predictions of another face recognition model. This is done by exploiting vulnerabilities within the model implementation or API, with which the attacker can recover personal information used during the training phase.

Membership Inference Attack

These attacks aim to determine if a specific data point was part of the training data used to build the model. For example, an attacker might try to find out if their medical records were included in a healthcare model’s training data. Alternatively, an attacker might try to determine if their financial information was included in a bank’s credit scoring model’s training data. This could be achieved by analyzing the model’s outputs for specific loan applications.

Example Attack Scenario for Membership Inference Attack

Attackers use membership inference to query whether a particular individual’s financial data was used to train a financial prediction model. Attackers can use this to extract sensitive private and financial information about the individuals.

Model Theft

A trained machine learning model can be a valuable asset. Model theft occurs when attackers steal a trained model, potentially for malicious purposes. They could use the stolen model for tasks like generating fake content or replicating its functionality for fraudulent activities. A self-driving car’s control system could be a target for model theft. Attackers could steal the trained model to manipulate its behavior, potentially causing safety hazards.

Example Attack Scenario for Model Theft

Steal an ML Model From a Competitor

Attackers execute this attack by either reverse engineering the model by decompiling the binary code or repeatedly querying it to gain access to its parameters. Once they have access to the model’s parameters, attackers can start using it for themselves, causing financial and reputational damage to the competitor.

AI Supply Chain Attack

Just like any software, ML systems often rely on third-party libraries or pre-trained models. Vulnerabilities in these components can introduce risks into your system. An attacker might exploit a weakness in a pre-trained model to gain access to your system or manipulate its outputs. A facial recognition system might rely on a third-party library for facial landmark detection. If this library has a vulnerability, attackers could exploit it to gain access to the system or manipulate facial recognition results.

Example Attack Scenario for AI Supply Chain Attack

If an organization uses a public library in its applications, attackers can replace or modify the library’s code. When the target organization uses the compromised library, attackers can execute malicious activities within the organization’s ML applications.

image 2

Transfer Learning Attack

Transfer learning is a powerful technique where a pre-trained model is used as a starting point for a new task. However, vulnerabilities in the source model can be transferred to the new model. For instance, a facial recognition model trained on a biased dataset might inherit that bias even after transfer learning for a different task. Imagine using a sentiment analysis model pre-trained on social media data for financial news analysis. The model might inherit biases from the social media data, leading to skewed sentiment analysis of financial news.

Example Attack Scenario for Transfer Learning Attack

Attackers can train a dataset with manipulated images to target a Medical Diagnosis system. Once it starts using the tampered dataset, the system can make incorrect predictions, leading to incorrect diagnoses and harmful treatment suggestions.

image 1

Model Skewing

Biases in the training data can lead to skewed model outputs. Imagine a loan approval model trained on historical data that favored certain demographics. This could lead to unfair or discriminatory lending practices. A historical loan approval dataset might reflect societal biases. If used to train a loan approval model, it could perpetuate those biases, unfairly disadvantaging certain demographics.

Example Attack Scenario for Model Skewing

Attackers skew the feedback data for a product suggestion model’s prediction, leading to biased product suggestions for users.

image

Output Integrity Attack

A type of security threat where an attacker manipulates or compromises the output of a machine learning model to achieve malicious objectives. For example, an attacker could manipulate the output of a spam filter to ensure their malicious emails bypass detection. Alternatively, Attackers could target the outputs of a fraud detection system in the financial sector. By manipulating the system’s outputs, they might be able to bypass fraud checks and steal money.

Example Attack Scenarios for Output Integrity Attack

Output Integrity Attacks are those in which the attackers modify or manipulate the output of an ML model, which leads to incorrect or altered results being presented to the users or the systems being used.

Model Poisoning

These attacks directly modify the model itself to alter its behavior. An attacker might gain access to the model and manipulate its code or weights to achieve a specific outcome, such as bypassing security checks. Imagine attackers gaining access to a traffic light control system that relies on an ML model to optimize traffic flow. Tampering with the model’s code could disrupt traffic patterns or even cause accidents.

Example Attack Scenarios for Model Poisoning

Attackers can input poisoned data during the training phase of the ML model, causing a sentiment analysis model to misinterpret positive sentiments as negative. They can lead to harmful automated responses for business emails.

Conclusion

As ML adoption continues to expand, security cannot be an afterthought. Understanding and mitigating the OWASP Machine Learning Security Top 10 is crucial to protecting ML systems from adversarial threats. By implementing strong security measures, organizations can ensure the integrity, confidentiality, and reliability of their ML applications.

Are you currently working on securing ML models? Share your thoughts and experiences in the comments below!

References:

OWASP Machine Learning Security Top Ten | OWASP Foundation

OWASP Machine Learning Top 10 Explained – Astra Security Blog

Feel free to drop your thoughts in the comment section below. Subscribe to sapiencespace and enable notifications to get regular insights.

Click here to explore through similar insights.

😀
0
😍
0
😢
0
😡
0
👍
0
👎
0

Leave a Reply

Your email address will not be published. Required fields are marked *

Subscribe To our Newsletter

Subscription Form

Recently Posted

Share

Subscribe To Newsletter

Search

Home