Text Recognition

This section introduces how to use OCR to assist with interface operations. In applications such as games, conventional interface selectors may not work; in such cases, the OCR approach can be used. The OCR recognition method only supports operations like checking for element existence, clicking, and taking screenshots. Supported OCR backends include paddleocr, easyocr, and custom HTTP API interfaces.

Setting the OCR Backend

Before using the OCR recognition method, you must set the OCR backend and install the required dependencies in advance on your own.

Attention

If it is a cluster scenario, i.e., controlling multiple devices on the same computer, you must encapsulate it as an OCR HTTP interface yourself. Using paddleocr or easyocr directly will consume a large amount of local memory or computing resources, because each process will load the models repeatedly.

Using paddleocr as the backend, set the screenshot quality for recognition to 80 and enable GPU acceleration.

d.setup_ocr_backend("paddleocr", quality=80, use_gpu=True, drop_score=0.85, use_space_char=True)

Using easyocr as the backend, set the screenshot quality for recognition to 80, and recognize simplified Chinese and English.

d.setup_ocr_backend("easyocr", ["ch_sim", "en"], quality=80)

Extra parameters in setup_ocr_backend should be the arguments used when initializing the respective instance. If you are unsure about the spelling of the above parameters, refer to the official instantiation examples below and compare them with the code above.

paddleocr.PaddleOCR(use_gpu=True, drop_score=0.85, use_space_char=True)
easyocr.Reader(["ch_sim", "en"])

A custom OCR backend is mainly used for scenarios with many devices or when the local machine lacks GPU acceleration. You can deploy the recognition functionality as an HTTP service and use the custom backend to request remote recognition. You need to inherit and implement CustomOcrBackend, formatting the recognition results according to the required format. You can also find the definition of the response format in our provided paddle_ocr_http_backend.py, make minor modifications, and directly deploy that service.

class HttpOcrBackend(CustomOcrBackend):
    def __init__(self, url, auth):
        self.auth = auth
        self.url = url
    def ocr(self, image: bytes):
        r = requests.post(self.url, headers={"X-Auth": self.auth},
                                                    data=image)
        return r.json()

Then set the OCR recognition backend to the custom service class.

d.setup_ocr_backend(HttpOcrBackend, "http://server/ocr", "Secret")

OCR Selectors

Currently, the OCR recognition selectors support the following types.

text

Matches the complete text.

element = d.ocr(text="我的")

textContains

Partial text containment match.

element = d.ocr(textContains="我的")

textMatches

Text regular expression match.

element = d.ocr(textMatches=".*?我的")

OCR Operations

Currently, OCR recognition selectors support the following related operations.

click

Click the selected element.

element.click()

click_exists

Click if the element exists.

element.click_exists()

exists

Check if the element exists.

element.exists()

screenshot

Take a screenshot of the matched element.

element.screenshot(100).save("element.png")

info

Get the matched OCR information.

element.info()

Tip

If OCR still cannot solve your problem, you can also try using the image feature matching interface for image matching.