OCR Operations

This chapter introduces how to use OCR to assist with UI operations. In situations like gaming applications, conventional UI selectors may not be usable. In such cases, you can opt for OCR-based methods. The OCR recognition method only supports operations such as checking if an element exists, clicking, and taking screenshots. The OCR backend library supports paddleocr, easyocr, and custom HTTP backend interfaces.

Setting Up the OCR Backend

Before using OCR recognition, you need to set up the OCR backend first. You will need to install the required dependency libraries in advance yourself.

Attention

If you are running a cluster, meaning you need to control multiple devices on the same computer, it is crucial to wrap the OCR functionality into an HTTP service yourself. Using paddleocr or easyocr directly will consume a large amount of local memory or computing resources because each process will load them repeatedly.

Use paddleocr as the backend, with a screenshot quality of 80 for recognition, and enable GPU acceleration.

d.setup_ocr_backend("paddleocr", quality=80, use_gpu=True, drop_score=0.85, use_space_char=True)

Use easyocr as the backend, with a screenshot quality of 80, to recognize Simplified Chinese and English.

d.setup_ocr_backend("easyocr", ["ch_sim", "en"], quality=80)

The additional arguments in setup_ocr_backend should be the parameters for initializing the instance. If you are not sure how to construct the arguments above, please refer to the official instantiation parameters below and compare them.

paddleocr.PaddleOCR(use_gpu=True, drop_score=0.85, use_space_char=True)
easyocr.Reader(["ch_sim", "en"])

A custom OCR backend is mainly used for controlling a large number of devices or when there is no GPU acceleration on the local machine. You can deploy your recognition code as an HTTP service and make requests for remote recognition within the custom backend. You need to inherit and write your own MyCustomOcrBackend and format the recognition results according to the required format. You can also find the definition of the response format in the paddle_ocr_http_backend.py we provide, and you can deploy this service directly with minor modifications.

class HttpOcrBackend(CustomOcrBackend):
    def __init__(self, url, auth):
        self.auth = auth
        self.url = url
    def ocr(self, image: bytes):
        r = requests.post(url, headers={"X-Auth": self.auth},
                                                    data=image)
        return r.json()

Then, simply set the OCR recognition backend to your custom service class.

d.setup_ocr_backend(HttpOcrBackend, "http://server/ocr", "Secret")

OCR Selectors

Currently, OCR recognition only supports the following types of selectors.

text

Matches the complete text.

element = d.ocr(text="我的")

textContains

Matches if the text contains the given string.

element = d.ocr(textContains="我的")

textMatches

Matches the text using a regular expression.

element = d.ocr(textMatches=".*?我的")

OCR Actions

Currently, OCR selectors only support the following related actions.

click

Clicks the selected element.

element.click()

click_exists

Clicks the selected element if it exists.

element.click_exists()

exists

Checks if the element exists.

element = d.ocr(textMatches=".*?我的")

screenshot

Takes a screenshot of the matched element.

element.screenshot(100, ).save("element.png")

info

Gets the information of the matched OCR result.

element.info()

Tip

If OCR still cannot solve your problem, you can also try using the image feature matching interface for image matching.