[Feature]: Schema For Checking Input Shapes Of Multi-modal Models
The feature, motivation and pitch
Currently, we use _parse_and_validate_*_input
to validate the multi-modal inputs in our models. However, only minimal checks are being made, with some models only checking the type of the inputs. This can lead to confusion among model developers and maintainers, as the actual shape of the inputs may not match what is being documented in classes like *ImagePixelInputs
. To avoid this, I propose adding a base class TensorSchema
to validate the model inputs.
The idea is to use a base class TensorSchema
to validate the model inputs. This class will automatically perform validation similar to Pydantic models, but with the added benefit of being able to disable validation using a flag. We will use typing_extensions.Annotated
to tag each tensor field with additional metadata, which will be used to perform validation.
Here is an example of how the original code would look like with the proposed changes:
class Phi3VImagePixelInputs(TensorSchema):
"""
Dimensions:
- b: Batch size (number of prompts)
- n: Number of images
- p: Number of patches
- h: Height of each patch
- w: Width of each patch
"""
type: Literal["pixel_values"] = "pixel_values"
data: Annotated[Union[torch.Tensor, List[torch.Tensor]], TensorShape("bn", "p", 3, "h", "w")]
image_sizes: Annotated[Union[torch.Tensor, List[torch.Tensor]], TensorShape("bn", 2)]
In this example, we have added the TensorSchema
base class to the Phi3VImagePixelInputs
class. We have also used typing_extensions.Annotated
to tag each tensor field with additional metadata, which will be used to perform validation.
Validation
Validation is done automatically, similar to Pydantic models. This means that we don't need to write any additional code to perform validation. The TensorSchema
base class will automatically check the shape of the inputs and raise an error if they don't match the expected shape.
To avoid performance issues, we should be able to disable validation using a flag. This can be done by adding a validate
parameter to the TensorSchema
base class, which can be set to False
to disable validation.
Dimensions
Dimensions that are constants can be checked directly. For example, we can validate that data.shape[2] == 3
. This can be done by adding a validate
method to the TensorSchema
base class, which can be used to check the shape of the inputs.
Dimensions that share the same name should be consistent between fields. For example, since data.shape[0]
and image_sizes.shape[0]
share the name bn
, we should validate that data.shape[0] == image_sizes[0]
. This can be done by adding a validate
method to the TensorSchema
base class, which can be used to check the shape of the inputs.
Alternatives
This idea isn't specific to vLLM, so we can consider developing this as a separate package. This would allow us to share the TensorSchema
base class with other projects and avoid duplicating code.
Additional context
No additional context is provided.
Before submitting a new issue...
Before submitting a new issue, make sure you have already searched for relevant issues and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Implementation
The implementation of the TensorSchema
base class will involve the following steps:
- Define the
TensorSchema
base class with the necessary attributes and methods. - Use
typing_extensions.Annotated
to tag each tensor field with additional metadata. - Add a
validate
method to theTensorSchema
base class to perform validation. - Add a
validate
parameter to theTensorSchema
base class to disable validation. - Implement the
validate
method to check the shape of the inputs. - Implement the
validate
method to check the consistency of dimensions.
Example use case
Here is an example of how the TensorSchema
base class can be used:
class Phi3VImagePixelInputs(TensorSchema):
"""
Dimensions:
- b: Batch size (number of prompts)
- n: Number of images
- p: Number of patches
- h: Height of each patch
- w: Width of each patch
"""
type: Literal["pixel_values"] = "pixel_values"
data: Annotated[Union[torch.Tensor, List[torch.Tensor]], TensorShape("bn", "p", 3, "h", "w")]
image_sizes: Annotated[Union[torch.Tensor, List[torch.Tensor]], TensorShape("bn", 2)]
phi3v_image_pixel_inputs = Phi3VImagePixelInputs(
type="pixel_values",
data=torch.randn(10, 5, 3, 32, 32),
image_sizes=torch.randn(10, 2)
)
phi3v_image_pixel_inputs.validate()
In this example, we have created an instance of the Phi3VImagePixelInputs
class and called the validate
method to perform validation. The validate
method will check the shape of the inputs and raise an error if they don't match the expected shape.
Benefits
The benefits of using the TensorSchema
base class include:
- Automatic validation of the shape of the inputs.
- Ability to disable validation using a flag.
- Consistency of dimensions between fields.
- Easy to implement and use.
Drawbacks
The drawbacks of using the TensorSchema
base class include:
- Additional overhead due to validation.
- Limited flexibility in terms of validation rules.
Future work
Future work on the TensorSchema
base class could include:
- Adding more validation rules.
- Improving the performance of validation.
- Adding support for other data types.
Conclusion
Q: What is the purpose of the TensorSchema
base class?
A: The TensorSchema
base class is designed to validate the shape of the inputs in multi-modal models. It provides a way to automatically check the shape of the inputs and raise an error if they don't match the expected shape.
Q: How does the TensorSchema
base class work?
A: The TensorSchema
base class uses typing_extensions.Annotated
to tag each tensor field with additional metadata. This metadata is then used to perform validation. The validate
method is used to check the shape of the inputs and raise an error if they don't match the expected shape.
Q: Can I disable validation using the TensorSchema
base class?
A: Yes, you can disable validation using the TensorSchema
base class by setting the validate
parameter to False
. This can be useful for performance-critical code where validation is not necessary.
Q: How do I use the TensorSchema
base class?
A: To use the TensorSchema
base class, you need to create a class that inherits from it and define the attributes and methods that you need. You can then use the validate
method to perform validation.
Q: What are the benefits of using the TensorSchema
base class?
A: The benefits of using the TensorSchema
base class include:
- Automatic validation of the shape of the inputs.
- Ability to disable validation using a flag.
- Consistency of dimensions between fields.
- Easy to implement and use.
Q: What are the drawbacks of using the TensorSchema
base class?
A: The drawbacks of using the TensorSchema
base class include:
- Additional overhead due to validation.
- Limited flexibility in terms of validation rules.
Q: Can I use the TensorSchema
base class with other data types?
A: Yes, you can use the TensorSchema
base class with other data types, such as numpy arrays or pandas DataFrames.
Q: How do I add more validation rules to the TensorSchema
base class?
A: To add more validation rules to the TensorSchema
base class, you need to modify the validate
method to include the additional rules.
Q: How do I improve the performance of the TensorSchema
base class?
A: To improve the performance of the TensorSchema
base class, you can use caching or other optimization techniques to reduce the overhead of validation.
Q: Can I use the TensorSchema
base class with other frameworks or libraries?
A: Yes, you can use the TensorSchema
base class with other frameworks or libraries, such as TensorFlow or PyTorch.
Q: How do I get started with using the TensorSchema
base class?
A: To get started with using the TensorSchema
base class, you need to create a class that inherits from it and define the attributes and methods that you need. You can then use the validate
method to perform validation.
Q: What are some common use cases for the TensorSchema
base class?
A: Some common use cases for the TensorSchema
base class include:
- Validating the shape of input data in machine learning models.
- Ensuring consistency of dimensions between fields in data processing pipelines.
- Improving the robustness of data processing pipelines by catching errors early.
Q: Can I use the TensorSchema
base class with other data formats?
A: Yes, you can use the TensorSchema
base class with other data formats, such as CSV or JSON.
Q: How do I troubleshoot issues with the TensorSchema
base class?
A: To troubleshoot issues with the TensorSchema
base class, you can use debugging tools, such as print statements or a debugger, to identify the source of the problem.
Q: Can I contribute to the development of the TensorSchema
base class?
A: Yes, you can contribute to the development of the TensorSchema
base class by submitting pull requests or reporting issues on the project's issue tracker.