[Feature]: Schema For Checking Input Shapes Of Multi-modal Models

Mar 13, 2025 by ADMIN 66 views

The feature, motivation and pitch

Currently, we use _parse_and_validate_*_input to validate the multi-modal inputs in our models. However, only minimal checks are being made, with some models only checking the type of the inputs. This can lead to confusion among model developers and maintainers, as the actual shape of the inputs may not match what is being documented in classes like *ImagePixelInputs. To avoid this, I propose adding a base class TensorSchema to validate the model inputs.

The idea is to use a base class TensorSchema to validate the model inputs. This class will automatically perform validation similar to Pydantic models, but with the added benefit of being able to disable validation using a flag. We will use typing_extensions.Annotated to tag each tensor field with additional metadata, which will be used to perform validation.

Here is an example of how the original code would look like with the proposed changes:

class Phi3VImagePixelInputs(TensorSchema):
    """
    Dimensions:
        - b: Batch size (number of prompts)
        - n: Number of images
        - p: Number of patches
        - h: Height of each patch
        - w: Width of each patch
    """
    type: Literal["pixel_values"] = "pixel_values"
    data: Annotated[Union[torch.Tensor, List[torch.Tensor]], TensorShape("bn", "p", 3, "h", "w")]
    image_sizes: Annotated[Union[torch.Tensor, List[torch.Tensor]], TensorShape("bn", 2)]

In this example, we have added the TensorSchema base class to the Phi3VImagePixelInputs class. We have also used typing_extensions.Annotated to tag each tensor field with additional metadata, which will be used to perform validation.

Validation

Validation is done automatically, similar to Pydantic models. This means that we don't need to write any additional code to perform validation. The TensorSchema base class will automatically check the shape of the inputs and raise an error if they don't match the expected shape.

To avoid performance issues, we should be able to disable validation using a flag. This can be done by adding a validate parameter to the TensorSchema base class, which can be set to False to disable validation.

Dimensions

Dimensions that are constants can be checked directly. For example, we can validate that data.shape[2] == 3. This can be done by adding a validate method to the TensorSchema base class, which can be used to check the shape of the inputs.

Dimensions that share the same name should be consistent between fields. For example, since data.shape[0] and image_sizes.shape[0] share the name bn, we should validate that data.shape[0] == image_sizes[0]. This can be done by adding a validate method to the TensorSchema base class, which can be used to check the shape of the inputs.

Alternatives

This idea isn't specific to vLLM, so we can consider developing this as a separate package. This would allow us to share the TensorSchema base class with other projects and avoid duplicating code.

Additional context

No additional context is provided.

Before submitting a new issue...

Before submitting a new issue, make sure you have already searched for relevant issues and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Implementation

The implementation of the TensorSchema base class will involve the following steps:

Define the TensorSchema base class with the necessary attributes and methods.
Use typing_extensions.Annotated to tag each tensor field with additional metadata.
Add a validate method to the TensorSchema base class to perform validation.
Add a validate parameter to the TensorSchema base class to disable validation.
Implement the validate method to check the shape of the inputs.
Implement the validate method to check the consistency of dimensions.

Example use case

Here is an example of how the TensorSchema base class can be used:

class Phi3VImagePixelInputs(TensorSchema):
    """
    Dimensions:
        - b: Batch size (number of prompts)
        - n: Number of images
        - p: Number of patches
        - h: Height of each patch
        - w: Width of each patch
    """
    type: Literal["pixel_values"] = "pixel_values"
    data: Annotated[Union[torch.Tensor, List[torch.Tensor]], TensorShape("bn", "p", 3, "h", "w")]
    image_sizes: Annotated[Union[torch.Tensor, List[torch.Tensor]], TensorShape("bn", 2)]

phi3v_image_pixel_inputs = Phi3VImagePixelInputs(
    type="pixel_values",
    data=torch.randn(10, 5, 3, 32, 32),
    image_sizes=torch.randn(10, 2)
)

phi3v_image_pixel_inputs.validate()

In this example, we have created an instance of the Phi3VImagePixelInputs class and called the validate method to perform validation. The validate method will check the shape of the inputs and raise an error if they don't match the expected shape.

Benefits

The benefits of using the TensorSchema base class include:

Automatic validation of the shape of the inputs.
Ability to disable validation using a flag.
Consistency of dimensions between fields.
Easy to implement and use.

Drawbacks

The drawbacks of using the TensorSchema base class include:

Additional overhead due to validation.
Limited flexibility in terms of validation rules.

Future work

Future work on the TensorSchema base class could include:

Adding more validation rules.
Improving the performance of validation.
Adding support for other data types.

Conclusion

Q: What is the purpose of the `TensorSchema` base class?

A: The TensorSchema base class is designed to validate the shape of the inputs in multi-modal models. It provides a way to automatically check the shape of the inputs and raise an error if they don't match the expected shape.

Q: How does the `TensorSchema` base class work?

A: The TensorSchema base class uses typing_extensions.Annotated to tag each tensor field with additional metadata. This metadata is then used to perform validation. The validate method is used to check the shape of the inputs and raise an error if they don't match the expected shape.

Q: Can I disable validation using the `TensorSchema` base class?

A: Yes, you can disable validation using the TensorSchema base class by setting the validate parameter to False. This can be useful for performance-critical code where validation is not necessary.

Q: How do I use the `TensorSchema` base class?

A: To use the TensorSchema base class, you need to create a class that inherits from it and define the attributes and methods that you need. You can then use the validate method to perform validation.

Q: What are the benefits of using the `TensorSchema` base class?

A: The benefits of using the TensorSchema base class include:

Automatic validation of the shape of the inputs.
Ability to disable validation using a flag.
Consistency of dimensions between fields.
Easy to implement and use.

Q: What are the drawbacks of using the `TensorSchema` base class?

A: The drawbacks of using the TensorSchema base class include:

Additional overhead due to validation.
Limited flexibility in terms of validation rules.

Q: Can I use the `TensorSchema` base class with other data types?

A: Yes, you can use the TensorSchema base class with other data types, such as numpy arrays or pandas DataFrames.

Q: How do I add more validation rules to the `TensorSchema` base class?

A: To add more validation rules to the TensorSchema base class, you need to modify the validate method to include the additional rules.

Q: How do I improve the performance of the `TensorSchema` base class?

A: To improve the performance of the TensorSchema base class, you can use caching or other optimization techniques to reduce the overhead of validation.

Q: Can I use the `TensorSchema` base class with other frameworks or libraries?

A: Yes, you can use the TensorSchema base class with other frameworks or libraries, such as TensorFlow or PyTorch.

Q: How do I get started with using the `TensorSchema` base class?

A: To get started with using the TensorSchema base class, you need to create a class that inherits from it and define the attributes and methods that you need. You can then use the validate method to perform validation.

Q: What are some common use cases for the `TensorSchema` base class?

A: Some common use cases for the TensorSchema base class include:

Validating the shape of input data in machine learning models.
Ensuring consistency of dimensions between fields in data processing pipelines.
Improving the robustness of data processing pipelines by catching errors early.

Q: Can I use the `TensorSchema` base class with other data formats?

A: Yes, you can use the TensorSchema base class with other data formats, such as CSV or JSON.

Q: How do I troubleshoot issues with the `TensorSchema` base class?

A: To troubleshoot issues with the TensorSchema base class, you can use debugging tools, such as print statements or a debugger, to identify the source of the problem.

Q: Can I contribute to the development of the `TensorSchema` base class?

A: Yes, you can contribute to the development of the TensorSchema base class by submitting pull requests or reporting issues on the project's issue tracker.

The feature, motivation and pitch

Validation

Dimensions

Alternatives

Additional context

Before submitting a new issue...

Implementation

Example use case

Benefits

Drawbacks

Future work

Conclusion

Q: What is the purpose of the TensorSchema base class?

Q: How does the TensorSchema base class work?

Q: Can I disable validation using the TensorSchema base class?

Q: How do I use the TensorSchema base class?

Q: What are the benefits of using the TensorSchema base class?

Q: What are the drawbacks of using the TensorSchema base class?

Q: Can I use the TensorSchema base class with other data types?

Q: How do I add more validation rules to the TensorSchema base class?

Q: How do I improve the performance of the TensorSchema base class?

Q: Can I use the TensorSchema base class with other frameworks or libraries?

Q: How do I get started with using the TensorSchema base class?

Q: What are some common use cases for the TensorSchema base class?

Q: Can I use the TensorSchema base class with other data formats?

Q: How do I troubleshoot issues with the TensorSchema base class?

Q: Can I contribute to the development of the TensorSchema base class?