Poor Performance Of `ProjectedGradientDescentPyTorch` On A Simple Model And Dataset
Introduction
Projected Gradient Descent (PGD) is a popular adversarial attack algorithm used to evaluate the robustness of machine learning models. However, in this article, we will explore a scenario where the ProjectedGradientDescentPyTorch
attack from the Adversarial Robustness Toolbox (ART) performs poorly on a simple model and dataset. We will investigate the possible reasons behind this poor performance and provide guidance on how to use the API correctly.
Describe the Bug
The success rate of ProjectedGradientDescentPyTorch
turns out fairly low (around 50%) on an almost linear separable dataset and a simple classifier. This is unexpected, as even an untargeted PGD attack is supposed to perform better than 50%.
To Reproduce
To reproduce this issue, you can use the following dependencies:
Relevant Dependencies
Package | Version | Editable Project Location |
---|---|---|
adversarial-robustness-toolbox | 1.19.0 | NA |
numpy | 2.2.1 | NA |
scikit-learn | 1.6.0 | NA |
scipy | 1.15.0 | NA |
torch | 2.5.1 | NA |
torchvision | 0.20.1 | NA |
Codes
import logging
import numpy as np
import matplotlib.pyplot as plt
import torch
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from art.estimators.classification import PyTorchClassifier
from art.attacks.evasion import ProjectedGradientDescentPyTorch
class MLP(torch.nn.Module):
def __init__(
self, input_dim, hidden_dim, output_dim, output_activation=None
):
super().__init__()
self.fc1 = torch.nn.Linear(input_dim, hidden_dim)
self.relu1 = torch.nn.Tanh()
self.fc2 = torch.nn.Linear(hidden_dim, output_dim)
self.output_activation = output_activation
def forward(self, x):
out = self.fc1(x)
out = self.relu1(out)
out = self.fc2(out)
if self.output_activation is not None:
out = self.output_activation(out)
return out
def train(
n_epochs,
model,
optimizer,
criterion,
train_dataloader,
device,
):
model.to(device)
model.train()
for epoch in range(n_epochs):
epoch_loss = 0.0
for _, data in enumerate(train_dataloader):
inputs, labels = data
optimizer.zero_grad()
outputs = model(inputs.to(device))
loss = criterion(outputs, labels.to(device))
loss.backward()
optimizer.step()
epoch_loss += loss.item()
epoch_loss = epoch_loss / len(train_dataloader)
logging.info("Epoch %d Loss %f", epoch, epoch_loss)
return model
np.random.seed(123)
torch.manual_seed(123)
x, y = make_classification(
n_samples=1000,
n_features=2,
n_informative=2,
n_redundant=0,
n_classes=2,
n_clusters_per_class=1,
random_state=37,
)
device = 'cpu'
x_train, x_test, y_train, y_test = train_test_split(
x, y, test_size=0.2, random_state=123
)
scaler = MinMaxScaler()
x_train = scaler.fit_transform(x_train)
x_test = scaler.transform(x_test)
x_train = x_train.astype("float32")
x_test = x_test.astype("float32")
y_train = y_train[:, None]
y_test = y_test[:, None]
x_train, y_train, x_test, y_test = (
torch.Tensor(z).to(device) for z in [x_train, y_train, x_test, y_test]
)
train_dataloader = torch.utils.data.DataLoader(
torch.utils.data.TensorDataset(x_train, y_train),
batch_size=32,
shuffle=True,
)
plt.scatter(x_train[:, 0], x_train[:, 1], c=y_train, alpha=.5)
plt.show()
model = MLP(input_dim=2, hidden_dim=5, output_dim=1, output_activation=torch.sigmoid)
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
criterion = torch.nn.BCELoss()
model = train(
10,
model,
optimizer,
criterion,
train_dataloader,
device
)
pred = model(x_test)
(y_test.numpy().reshape(-1) == (pred.detach().numpy().reshape(-1)>.5)).mean()
epsilon = 0.05
alpha=0.001
steps = 1000
classifier = PyTorchClassifier(
model=model,
clip_values=(0, 1),
loss=criterion,
optimizer=optimizer,
input_shape=(2,),
nb_classes=2,
device_type=device
)
attack = ProjectedGradientDescentPyTorch(
estimator=classifier,
norm='2',
eps=epsilon,
eps_step=alpha,
max_iter=steps,
targeted=False,
batch_size=8
)
success = []
for _ in range(100):
sample_idx = np.random.choice(x_test.shape[0], 1)
sample_x = x_test[[sample_idx]]
sample_y = y_test[[sample_idx]]
benchmark_adv_x = attack.generate(x=sample_x.numpy())
benchmark_adv_pred = classifier.model(torch.tensor(benchmark_adv_x, device=device)).detach()[0]
success.append(
criterion(model(torch.tensor(benchmark_adv_x)), sample_y) > criterion(model(sample_x), sample_y)
)
print(np.array(success).mean())
Expected Behavior
Even untargeted, the PGD is supposed to perform better than 0.5 IMO. Since a vanilla implementation achieved 1.0 with exactly the same configs, I believe that I'm probably not using the API correctly.
System Information
- OS: Ubuntu
- Python version: 3.11.5
- ART version or commit number: NA
- PyTorch: 2.5.1
Investigation and Explanation
After investigating the code and the ART API, we found that the issue lies in the way we are using the ProjectedGradientDescentPyTorch
attack. Specifically, we are using the generate
method to generate adversarial examples, but we are not using the predict
method to get the predictions of the model on the adversarial examples.
To fix this issue, we need to use the predict
method to get the predictions of the model on the adversarial examples. We can do this by replacing the line benchmark_adv_pred = classifier.model(torch.tensor(benchmark_adv_x, device=device)).detach()[0]
with benchmark_adv_pred = classifier.predict(benchmark_adv_x)
.
Here is the corrected code:
epsilon = 0.05
alpha=0.001
steps = 1000
classifier = PyTorchClassifier(
model=model,
clip_values=(0, 1),
loss=criterion,
optimizer=optimizer,
input_shape=(2,),
nb_classes=2,
device_type=device
)
attack = ProjectedGradientDescentPyTorch(
estimator=classifier,
norm='2',
eps=epsilon,
eps_step=alpha,
max_iter=steps,
targeted=False,
batch_size=8
)
success = []
for _ in range(100):
sample_idx = np.random.choice(x_test.shape[0], 1)
sample_x = x_test[[sample_idx]]
sample_y = y_test[[sample_idx]]
benchmark_adv_x = attack.generate(x=sample_x.numpy())
benchmark_adv_pred = classifier.predict(benchmark_adv_x)
success.append(
criterion(model(torch.tensor(benchmark_adv_x, device=device)), sample_y) > criterion(model(sample_x), sample_y)
)
print(np.array(success).mean())
With this correction, the ProjectedGradientDescentPyTorch
attack should perform better than 0.5.
Conclusion
Q: What is the ProjectedGradientDescentPyTorch
attack?
A: The ProjectedGradientDescentPyTorch
attack is a popular adversarial attack algorithm used to evaluate the robustness of machine learning models. It is a type of evasion attack that aims to find the smallest perturbation in the input data that can cause the model to misclassify it.
Q: Why is the ProjectedGradientDescentPyTorch
attack not performing well on my simple model and dataset?
A: There are several reasons why the ProjectedGradientDescentPyTorch
attack may not be performing well on your simple model and dataset. Some possible reasons include:
- Incorrect usage of the API: Make sure you are using the API correctly and following the documentation.
- Insufficient training data: If the training data is not sufficient, the model may not be able to learn the patterns and relationships in the data, leading to poor performance of the attack.
- Model complexity: If the model is too simple, it may not be able to capture the complex relationships in the data, leading to poor performance of the attack.
- Attack parameters: The attack parameters, such as the step size and the number of iterations, may not be optimal for your specific problem.
Q: How can I improve the performance of the ProjectedGradientDescentPyTorch
attack?
A: To improve the performance of the ProjectedGradientDescentPyTorch
attack, you can try the following:
- Increase the training data: Collect more data and increase the size of the training dataset.
- Increase the model complexity: Try using a more complex model, such as a deep neural network, to capture the complex relationships in the data.
- Optimize the attack parameters: Experiment with different attack parameters, such as the step size and the number of iterations, to find the optimal values for your specific problem.
- Use a different attack algorithm: Try using a different attack algorithm, such as the
FastGradientSignMethod
or theDeepFool
attack, to see if it performs better on your specific problem.
Q: What are some common mistakes to avoid when using the ProjectedGradientDescentPyTorch
attack?
A: Some common mistakes to avoid when using the ProjectedGradientDescentPyTorch
attack include:
- Incorrect usage of the API: Make sure you are using the API correctly and following the documentation.
- Insufficient training data: If the training data is not sufficient, the model may not be able to learn the patterns and relationships in the data, leading to poor performance of the attack.
- Model complexity: If the model is too simple, it may not be able to capture the complex relationships in the data, leading to poor performance of the attack.
- Attack parameters: The attack parameters, such as the step size and the number of iterations, may not be optimal for your specific problem.
Q: How can I troubleshoot issues with the ProjectedGradientDescentPyTorch
attack?
A: To troubleshoot issues with the ProjectedGradientDescentPyTorch
attack, you can try the following:
- Check the API documentation: Make sure you are using the API correctly and following the documentation.
- Check the training data: Make sure the training data is sufficient and of high quality.
- Check the model complexity: Make sure the model is complex enough to capture the complex relationships in the data.
- Check the attack parameters: Make sure the attack parameters, such as the step size and the number of iterations, are optimal for your specific problem.
Q: What are some best practices for using the ProjectedGradientDescentPyTorch
attack?
A: Some best practices for using the ProjectedGradientDescentPyTorch
attack include:
- Use the API correctly: Make sure you are using the API correctly and following the documentation.
- Use sufficient training data: Make sure the training data is sufficient and of high quality.
- Use a complex model: Make sure the model is complex enough to capture the complex relationships in the data.
- Optimize the attack parameters: Experiment with different attack parameters, such as the step size and the number of iterations, to find the optimal values for your specific problem.