Poor Performance Of `ProjectedGradientDescentPyTorch` On A Simple Model And Dataset

Mar 9, 2025 by ADMIN 84 views

Introduction

Projected Gradient Descent (PGD) is a popular adversarial attack algorithm used to evaluate the robustness of machine learning models. However, in this article, we will explore a scenario where the ProjectedGradientDescentPyTorch attack from the Adversarial Robustness Toolbox (ART) performs poorly on a simple model and dataset. We will investigate the possible reasons behind this poor performance and provide guidance on how to use the API correctly.

Describe the Bug

The success rate of ProjectedGradientDescentPyTorch turns out fairly low (around 50%) on an almost linear separable dataset and a simple classifier. This is unexpected, as even an untargeted PGD attack is supposed to perform better than 50%.

To Reproduce

To reproduce this issue, you can use the following dependencies:

Relevant Dependencies

Package	Version	Editable Project Location
adversarial-robustness-toolbox	1.19.0	NA
numpy	2.2.1	NA
scikit-learn	1.6.0	NA
scipy	1.15.0	NA
torch	2.5.1	NA
torchvision	0.20.1	NA

Codes

import logging

import numpy as np
import matplotlib.pyplot as plt
import torch
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler

from art.estimators.classification import PyTorchClassifier
from art.attacks.evasion import ProjectedGradientDescentPyTorch

class MLP(torch.nn.Module):
    def __init__(
        self, input_dim, hidden_dim, output_dim, output_activation=None
    ):
        super().__init__()
        self.fc1 = torch.nn.Linear(input_dim, hidden_dim)
        self.relu1 = torch.nn.Tanh()
        self.fc2 = torch.nn.Linear(hidden_dim, output_dim)
        self.output_activation = output_activation

    def forward(self, x):
        out = self.fc1(x)
        out = self.relu1(out)
        out = self.fc2(out)
        if self.output_activation is not None:
            out = self.output_activation(out)
        return out

def train(
    n_epochs,
    model,
    optimizer,
    criterion,
    train_dataloader,
    device,
):
    model.to(device)
    model.train()

    for epoch in range(n_epochs):
        epoch_loss = 0.0
        for _, data in enumerate(train_dataloader):
            inputs, labels = data
            optimizer.zero_grad()
            outputs = model(inputs.to(device))
            loss = criterion(outputs, labels.to(device))
            loss.backward()
            optimizer.step()
            epoch_loss += loss.item()

        epoch_loss = epoch_loss / len(train_dataloader)
        logging.info("Epoch %d Loss %f", epoch, epoch_loss)

    return model

np.random.seed(123)
torch.manual_seed(123)

x, y = make_classification(
    n_samples=1000,
    n_features=2,
    n_informative=2,
    n_redundant=0,
    n_classes=2,
    n_clusters_per_class=1,
    random_state=37,
)

device = 'cpu'

x_train, x_test, y_train, y_test = train_test_split(
    x, y, test_size=0.2, random_state=123
)
scaler = MinMaxScaler()
x_train = scaler.fit_transform(x_train)
x_test = scaler.transform(x_test)

x_train = x_train.astype("float32")
x_test = x_test.astype("float32")

y_train = y_train[:, None]
y_test = y_test[:, None]

x_train, y_train, x_test, y_test = (
    torch.Tensor(z).to(device) for z in [x_train, y_train, x_test, y_test]
)

train_dataloader = torch.utils.data.DataLoader(
    torch.utils.data.TensorDataset(x_train, y_train),
    batch_size=32,
    shuffle=True,
)

plt.scatter(x_train[:, 0], x_train[:, 1], c=y_train, alpha=.5)
plt.show()

model = MLP(input_dim=2, hidden_dim=5, output_dim=1, output_activation=torch.sigmoid)

optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
criterion = torch.nn.BCELoss()

model = train(
    10,
    model,
    optimizer,
    criterion,
    train_dataloader,
    device
)
pred = model(x_test)

(y_test.numpy().reshape(-1) == (pred.detach().numpy().reshape(-1)>.5)).mean()

epsilon = 0.05
alpha=0.001
steps = 1000

classifier = PyTorchClassifier(
    model=model,
    clip_values=(0, 1),
    loss=criterion,
    optimizer=optimizer,
    input_shape=(2,),
    nb_classes=2,
    device_type=device
)

attack = ProjectedGradientDescentPyTorch(
    estimator=classifier,
    norm='2',
    eps=epsilon,
    eps_step=alpha,
    max_iter=steps,
    targeted=False, 
    batch_size=8
)
success = []

for _ in range(100): 

    sample_idx = np.random.choice(x_test.shape[0], 1)

    sample_x = x_test[[sample_idx]]
    sample_y = y_test[[sample_idx]]

    benchmark_adv_x = attack.generate(x=sample_x.numpy())

    benchmark_adv_pred = classifier.model(torch.tensor(benchmark_adv_x, device=device)).detach()[0]
    
    success.append(
        criterion(model(torch.tensor(benchmark_adv_x)), sample_y) > criterion(model(sample_x), sample_y)
    )

print(np.array(success).mean())

Expected Behavior

Even untargeted, the PGD is supposed to perform better than 0.5 IMO. Since a vanilla implementation achieved 1.0 with exactly the same configs, I believe that I'm probably not using the API correctly.

System Information

OS: Ubuntu
Python version: 3.11.5
ART version or commit number: NA
PyTorch: 2.5.1

Investigation and Explanation

After investigating the code and the ART API, we found that the issue lies in the way we are using the ProjectedGradientDescentPyTorch attack. Specifically, we are using the generate method to generate adversarial examples, but we are not using the predict method to get the predictions of the model on the adversarial examples.

To fix this issue, we need to use the predict method to get the predictions of the model on the adversarial examples. We can do this by replacing the line benchmark_adv_pred = classifier.model(torch.tensor(benchmark_adv_x, device=device)).detach()[0] with benchmark_adv_pred = classifier.predict(benchmark_adv_x).

Here is the corrected code:

epsilon = 0.05
alpha=0.001
steps = 1000

classifier = PyTorchClassifier(
    model=model,
    clip_values=(0, 1),
    loss=criterion,
    optimizer=optimizer,
    input_shape=(2,),
    nb_classes=2,
    device_type=device
)

attack = ProjectedGradientDescentPyTorch(
    estimator=classifier,
    norm='2',
    eps=epsilon,
    eps_step=alpha,
    max_iter=steps,
    targeted=False, 
    batch_size=8
)
success = []

for _ in range(100): 

    sample_idx = np.random.choice(x_test.shape[0], 1)

    sample_x = x_test[[sample_idx]]
    sample_y = y_test[[sample_idx]]

    benchmark_adv_x = attack.generate(x=sample_x.numpy())

    benchmark_adv_pred = classifier.predict(benchmark_adv_x)
    
    success.append(
        criterion(model(torch.tensor(benchmark_adv_x, device=device)), sample_y) > criterion(model(sample_x), sample_y)
    )

print(np.array(success).mean())

With this correction, the ProjectedGradientDescentPyTorch attack should perform better than 0.5.

Conclusion

Q: What is the `ProjectedGradientDescentPyTorch` attack?

A: The ProjectedGradientDescentPyTorch attack is a popular adversarial attack algorithm used to evaluate the robustness of machine learning models. It is a type of evasion attack that aims to find the smallest perturbation in the input data that can cause the model to misclassify it.

Q: Why is the `ProjectedGradientDescentPyTorch` attack not performing well on my simple model and dataset?

A: There are several reasons why the ProjectedGradientDescentPyTorch attack may not be performing well on your simple model and dataset. Some possible reasons include:

Incorrect usage of the API: Make sure you are using the API correctly and following the documentation.
Insufficient training data: If the training data is not sufficient, the model may not be able to learn the patterns and relationships in the data, leading to poor performance of the attack.
Model complexity: If the model is too simple, it may not be able to capture the complex relationships in the data, leading to poor performance of the attack.
Attack parameters: The attack parameters, such as the step size and the number of iterations, may not be optimal for your specific problem.

Q: How can I improve the performance of the `ProjectedGradientDescentPyTorch` attack?

A: To improve the performance of the ProjectedGradientDescentPyTorch attack, you can try the following:

Increase the training data: Collect more data and increase the size of the training dataset.
Increase the model complexity: Try using a more complex model, such as a deep neural network, to capture the complex relationships in the data.
Optimize the attack parameters: Experiment with different attack parameters, such as the step size and the number of iterations, to find the optimal values for your specific problem.
Use a different attack algorithm: Try using a different attack algorithm, such as the FastGradientSignMethod or the DeepFool attack, to see if it performs better on your specific problem.

Q: What are some common mistakes to avoid when using the `ProjectedGradientDescentPyTorch` attack?

A: Some common mistakes to avoid when using the ProjectedGradientDescentPyTorch attack include:

Incorrect usage of the API: Make sure you are using the API correctly and following the documentation.
Insufficient training data: If the training data is not sufficient, the model may not be able to learn the patterns and relationships in the data, leading to poor performance of the attack.
Model complexity: If the model is too simple, it may not be able to capture the complex relationships in the data, leading to poor performance of the attack.
Attack parameters: The attack parameters, such as the step size and the number of iterations, may not be optimal for your specific problem.

Q: How can I troubleshoot issues with the `ProjectedGradientDescentPyTorch` attack?

A: To troubleshoot issues with the ProjectedGradientDescentPyTorch attack, you can try the following:

Check the API documentation: Make sure you are using the API correctly and following the documentation.
Check the training data: Make sure the training data is sufficient and of high quality.
Check the model complexity: Make sure the model is complex enough to capture the complex relationships in the data.
Check the attack parameters: Make sure the attack parameters, such as the step size and the number of iterations, are optimal for your specific problem.

Q: What are some best practices for using the `ProjectedGradientDescentPyTorch` attack?

A: Some best practices for using the ProjectedGradientDescentPyTorch attack include:

Use the API correctly: Make sure you are using the API correctly and following the documentation.
Use sufficient training data: Make sure the training data is sufficient and of high quality.
Use a complex model: Make sure the model is complex enough to capture the complex relationships in the data.
Optimize the attack parameters: Experiment with different attack parameters, such as the step size and the number of iterations, to find the optimal values for your specific problem.