Automatically Test OCR API Endpoints As GitHub Action For Every PR

Mar 12, 2025 by ADMIN 67 views

Introduction

In the world of software development, Continuous Integration (CI) and Continuous Deployment (CD) are crucial for ensuring the quality and reliability of our applications. One of the key aspects of CI/CD is automated testing, which helps catch bugs and errors early in the development process. In this article, we will explore how to create a GitHub Action that automatically tests OCR API endpoints for every Pull Request (PR) to prevent issues like the one mentioned in https://github.com/swisstopo/swissgeol-ocr/pull/14.

The Problem

As mentioned in the introduction, a method signature was changed in https://github.com/swisstopo/swissgeol-ocr/pull/14, while a call to this API was missed and not changed accordingly. This led to a bug that was thankfully already discovered while the changes were still only deployed to DEV (https://github.com/swisstopo/swissgeol-assets-suite/issues/392#issuecomment-2713849762). The error was fixed in https://github.com/swisstopo/swissgeol-ocr/pull/20.

To avoid such issues, it would be preferable if we can introduce some CI GitHub Action that checks if all API endpoints work correctly with some dummy data. This is where automated testing comes in.

Mocking External Dependencies

To test our OCR API endpoints, we need to "mock" the external dependencies of the API. In our case, we have two external dependencies:

S3: We can use a library like MinIO to mock S3 for local testing.
AWS Textract: We can use a library like moto to mock AWS Textract for local testing.

Mocking S3 with MinIO

MinIO is a popular open-source object storage server compatible with Amazon S3 API. We can use MinIO to mock S3 for local testing.

# Install MinIO
pip install minio

# Create a MinIO server
minio server /path/to/minio/data

We can then use the MinIO server to mock S3 in our tests.

import minio

# Create a MinIO client
client = minio.Minio(
    'localhost:9000',
    access_key='minio',
    secret_key='minio',
    secure=False
)

# Create a bucket
client.make_bucket('my-bucket')

Mocking AWS Textract with moto

moto is a library that allows us to mock AWS services like AWS Textract for local testing.

# Install moto
pip install moto

We can then use moto to mock AWS Textract in our tests.

import boto3

# Create a moto client
client = boto3.client('textract')

# Create a document
document = {
    'Bytes': b'Hello, World!'
}

# Call the DetectDocumentText method
response = client.detect_document_text(
    Document={'Bytes': document['Bytes']}
)

Creating a GitHub Action

Now that we have mocked our external dependencies, we can create a GitHub Action that tests our OCR API endpoints.

name: Test OCR API Endpoints

on:
  pull_request:
    branches:
      - main

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v2

      - name: Install dependencies
        run: |
          pip install -r requirements.txt

      - name: Mock S3
        run: |
          minio server /path/to/minio/data

      - name: Mock AWS Textract
        run: |
          moto -v

      - name: Test OCR API Endpoints
        run: |
          python -m unittest test_ocr_api.py

In this example, we have created a GitHub Action that tests our OCR API endpoints on every PR to the main branch. The action installs the dependencies, mocks S3 and AWS Textract, and then runs the tests.

Conclusion

In this article, we have explored how to create a GitHub Action that automatically tests OCR API endpoints for every PR. We have also discussed how to mock external dependencies like S3 and AWS Textract for local testing. By following the steps outlined in this article, you can create a similar GitHub Action for your own OCR API endpoints.

Future Work

There are several areas where we can improve this GitHub Action:

Add more tests: We can add more tests to cover more scenarios and edge cases.
Use a more robust mocking library: We can use a more robust mocking library like pytest-mock to mock our external dependencies.
Integrate with CI/CD pipeline: We can integrate this GitHub Action with our CI/CD pipeline to automate the testing process.

Introduction

In our previous article, we explored how to create a GitHub Action that automatically tests OCR API endpoints for every Pull Request (PR). In this article, we will answer some frequently asked questions (FAQs) about creating a similar GitHub Action for your own OCR API endpoints.

Q: What is a GitHub Action?

A: A GitHub Action is a way to automate tasks on GitHub, such as building, testing, and deploying code. GitHub Actions are triggered by events, such as push or pull requests, and can be used to automate a wide range of tasks.

Q: Why do I need to mock external dependencies?

A: External dependencies, such as S3 and AWS Textract, can be difficult to test in a real-world environment. By mocking these dependencies, you can test your code in a controlled environment and ensure that it works as expected.

Q: What is MinIO?

A: MinIO is a popular open-source object storage server compatible with Amazon S3 API. It can be used to mock S3 for local testing.

Q: What is moto?

A: moto is a library that allows you to mock AWS services like AWS Textract for local testing.

Q: How do I create a GitHub Action?

A: To create a GitHub Action, you need to create a YAML file that defines the action. This file should include the following:

Name: The name of the action.
On: The event that triggers the action.
Jobs: The tasks that the action performs.
Steps: The individual steps that the action performs.

Q: How do I test my OCR API endpoints?

A: To test your OCR API endpoints, you need to create a test suite that includes the following:

Mocking external dependencies: You need to mock external dependencies like S3 and AWS Textract.
Calling the API: You need to call the API and verify that it returns the expected results.
Asserting the results: You need to assert that the results are as expected.

Q: How do I integrate my GitHub Action with my CI/CD pipeline?

A: To integrate your GitHub Action with your CI/CD pipeline, you need to create a pipeline that includes the following:

Trigger: The trigger that starts the pipeline.
Actions: The actions that the pipeline performs.
Steps: The individual steps that the pipeline performs.

Q: What are some best practices for creating a GitHub Action?

A: Some best practices for creating a GitHub Action include:

Keep it simple: Keep your action simple and focused on a specific task.
Use a clear and concise name: Use a clear and concise name for your action.
Use a YAML file: Use a YAML file to define your action.
Test your action: Test your action thoroughly before deploying it.

Conclusion

In this article, we have answered some frequently asked questions (FAQs) about creating a GitHub Action that automatically tests OCR API endpoints for every Pull Request (PR). We have also discussed some best practices for creating a GitHub Action. By following these best practices and using the techniques outlined in this article, you can create a similar GitHub Action for your own OCR API endpoints.

Future Work

There are several areas where we can improve this article:

Add more FAQs: We can add more FAQs to cover more scenarios and edge cases.
Provide more detailed examples: We can provide more detailed examples of how to create a GitHub Action.
Discuss more advanced topics: We can discuss more advanced topics, such as using GitHub Actions with other tools and services.

By following these steps, we can create a more comprehensive and helpful article that provides valuable information to developers who want to create a GitHub Action that automatically tests OCR API endpoints for every PR.