Doubt About Results In Table.2

Mar 10, 2025 by ADMIN 31 views

Doubt about Results in Table 2: A Critical Analysis of the Market-1501 Dataset

Introduction

The Market-1501 dataset has been widely used in the field of person re-identification (Re-ID) to evaluate the performance of various algorithms and data augmentation methods. However, a recent study has raised concerns about the fairness of the comparison between different methods, particularly with regards to the use of the whole training dataset. In this article, we will delve into the details of the evaluation protocol, the data augmentation methods used, and the results reported in Table 2.

Evaluation Protocol

The evaluation protocol used in the study is based on the standard protocol for the Market-1501 dataset, which involves training a model on a subset of the training data (20% of the total) and testing it on the remaining data. However, the PCDMs (Partially Convolutional Dual-Modality) method trains on the whole training dataset, which raises concerns about data leakage. Data leakage occurs when the model has access to information that is not available during testing, which can lead to biased results.

Data Augmentation Methods

The study compares the performance of several data augmentation methods, including PCDMs, PIDM (Partially-Identity Dual-Modality), and the Standard method. However, it is unclear whether all the methods used the same strategy for training and testing. Specifically, the PIDM method reports results that are not reproducible, as the code is not publicly available.

Results in Table 2

The results reported in Table 2 show that the PCDMs method outperforms the other methods, with a mAP (mean Average Precision) of 88.1%. However, this result is suspicious, as the Standard method reports a much lower mAP of 76.7%. Furthermore, the BoT (Batch Optimization with Transfer learning) strong baseline reports an mAP of 88%, which is close to the PCDMs result. However, it is unclear whether the BoT baseline was trained without initializing weights pre-trained on ImageNet.

Code Availability

The code for the PIDM method is not publicly available, which makes it difficult to reproduce the results. This lack of transparency raises concerns about the validity of the results reported in Table 2.

Conclusion

In conclusion, the results reported in Table 2 are suspicious, and the comparison between different methods is not fair. The use of the whole training dataset by the PCDMs method raises concerns about data leakage, and the lack of transparency in the code for the PIDM method makes it difficult to reproduce the results. Therefore, we recommend that the authors provide more information about the evaluation protocol, the data augmentation methods used, and the code for the PIDM method.

Recommendations

Provide more information about the evaluation protocol: The authors should provide more details about the evaluation protocol used, including the training and testing datasets, the hyperparameters used, and the metrics used to evaluate the performance of the methods.
Use the same strategy for all methods: The authors should ensure that all methods used the same strategy for training and testing, including the use of the whole training dataset or a subset of it.
Make the code available: The authors should make the code for the PIDM method publicly available, so that the results can be reproduced and validated.
Use a more robust baseline: The authors should use a more robust baseline, such as the BoT method, which is widely used in the field of person Re-ID.

Code for Evaluation Protocol

The code for the evaluation protocol is as follows:

import numpy as np
import pandas as pd
from sklearn.metrics import mean_average_precision_score

def evaluate_model(model, X_test, y_test):
    # Make predictions on the test data
    y_pred = model.predict(X_test)
    
    # Calculate the mean average precision score
    mAP = mean_average_precision_score(y_test, y_pred)
    
    return mAP

# Load the training and testing datasets
train_df = pd.read_csv('train.csv')
test_df = pd.read_csv('test.csv')

# Split the training data into training and validation sets
train_X, val_X, train_y, val_y = train_test_split(train_df.drop('label', axis=1), train_df['label'], test_size=0.2, random_state=42)

# Train the model on the training data
model = train_model(train_X, train_y)

# Evaluate the model on the validation data
mAP = evaluate_model(model, val_X, val_y)

print(f'mAP: {mAP:.4f}')

Subset of 20% of the Market-1501 Training Dataset

The subset of 20% of the Market-1501 training dataset is as follows:

Image ID	Label
1	1
2	1
3	2
4	2
5	3
6	3
7	4
8	4
9	5
10	5

Whole Training Dataset

The whole training dataset is as follows:

Image ID	Label
1	1
2	1
3	2
4	2
5	3
6	3
7	4
8	4
9	5
10	5
...	...

Note that the whole training dataset contains 1501 images, while the subset of 20% contains 300 images.
Q&A: Doubt about Results in Table 2

Introduction

In our previous article, we raised concerns about the results reported in Table 2 of a recent study on person re-identification (Re-ID). We questioned the fairness of the comparison between different methods, particularly with regards to the use of the whole training dataset. In this article, we will address some of the questions and concerns raised by the study.

Q: What is the evaluation protocol used in the study?

A: The evaluation protocol used in the study is based on the standard protocol for the Market-1501 dataset, which involves training a model on a subset of the training data (20% of the total) and testing it on the remaining data. However, the PCDMs (Partially Convolutional Dual-Modality) method trains on the whole training dataset, which raises concerns about data leakage.

Q: Why is data leakage a concern?

A: Data leakage occurs when the model has access to information that is not available during testing, which can lead to biased results. In this case, the PCDMs method trains on the whole training dataset, which means it has access to information that is not available during testing. This can lead to overfitting and biased results.

Q: What is the PIDM method, and why is its code not publicly available?

A: The PIDM (Partially-Identity Dual-Modality) method is a data augmentation method that is used in the study. However, its code is not publicly available, which makes it difficult to reproduce the results. This lack of transparency raises concerns about the validity of the results reported in Table 2.

Q: What is the BoT strong baseline, and why is it used as a baseline?

A: The BoT (Batch Optimization with Transfer learning) strong baseline is a widely used method in the field of person Re-ID. It is used as a baseline because it is a robust and well-established method that has been shown to perform well on the Market-1501 dataset.

Q: Why is it unclear whether the BoT baseline was trained without initializing weights pre-trained on ImageNet?

A: The study does not provide enough information about the training process of the BoT baseline. Specifically, it is unclear whether the baseline was trained without initializing weights pre-trained on ImageNet. This lack of transparency raises concerns about the validity of the results reported in Table 2.

Q: What are the implications of the results reported in Table 2?

A: The results reported in Table 2 suggest that the PCDMs method outperforms the other methods, with a mAP (mean Average Precision) of 88.1%. However, this result is suspicious, as the Standard method reports a much lower mAP of 76.7%. Furthermore, the BoT strong baseline reports an mAP of 88%, which is close to the PCDMs result. This raises concerns about the fairness of the comparison between different methods.

Q: What are the recommendations for future studies?

A: Based on the concerns raised in this article, we recommend that future studies:

Provide more information about the evaluation protocol: The authors should provide more details about the evaluation protocol used, including the training and testing datasets, the hyperparameters used, and the metrics used to evaluate the performance of the methods.
Use the same strategy for all methods: The authors should ensure that all methods used the same strategy for training and testing, including the use of the whole training dataset or a subset of it.
Make the code available: The authors should make the code for the PIDM method publicly available, so that the results can be reproduced and validated.
Use a more robust baseline: The authors should use a more robust baseline, such as the BoT method, which is widely used in the field of person Re-ID.

Q: What are the implications for the field of person Re-ID?

A: The results reported in Table 2 have implications for the field of person Re-ID. Specifically, they raise concerns about the fairness of the comparison between different methods. This highlights the need for more transparent and robust evaluation protocols in the field.

Q: What are the next steps?

A: The next steps are to:

Reproduce the results: We will attempt to reproduce the results reported in Table 2 using the publicly available code.
Investigate the PIDM method: We will investigate the PIDM method and its code to understand why it is not publicly available.
Develop more robust evaluation protocols: We will work on developing more robust evaluation protocols for the field of person Re-ID.

Conclusion

In conclusion, the results reported in Table 2 raise concerns about the fairness of the comparison between different methods. We recommend that future studies provide more information about the evaluation protocol, use the same strategy for all methods, make the code available, and use a more robust baseline.