Text Recognition Operations¶

This chapter introduces how to use OCR to assist with interface operations. In situations such as gaming applications, conventional interface selectors may not be usable. In this case, you can choose to use OCR methods for operations. OCR recognition methods only support checking if elements exist, clicking, taking screenshots, and other operations. The OCR backend library supports using paddleocr, easyocr, and custom HTTP backend interfaces.

Setting Up the OCR Backend¶

Before using OCR recognition methods, you need to set up the OCR backend. You need to install the dependency libraries in advance.

Attention

If it’s a cluster, meaning you need to control multiple devices on the same computer, please be sure to encapsulate it as an OCR HTTP interface yourself. Using paddleocr or easyocr directly will occupy a lot of local memory or computing resources, as each process will reload repeatedly.

Using paddleocr as the backend, with a screenshot quality of 80, using GPU acceleration.

d.setup_ocr_backend("paddleocr", quality=80, use_gpu=True, drop_score=0.85, use_space_char=True)

Using easyocr as the backend, with a screenshot quality of 80, recognizing Simplified Chinese and English.

d.setup_ocr_backend("easyocr", ["ch_sim", "en"], quality=80)

The additional parameters in setup_ocr_backend should be parameters for initializing that instance. If you don’t understand how the above parameters are spelled, please look at the following official instantiation parameters and compare them with the above.

paddleocr.PaddleOCR(use_gpu=True, drop_score=0.85, use_space_char=True)
easyocr.Reader(["ch_sim", "en"])

Custom OCR backend, mainly used for controlling a large number of devices or situations where the local machine has no GPU acceleration. The written recognition can be deployed as an HTTP service, and remote recognition can be performed by requesting within the custom backend. You need to inherit and write MyCustomOcrBackend yourself and format the recognition results according to the required format. You can also find the definition of the response format in our provided paddle_ocr_http_backend.py, or you can make slight modifications and deploy this service directly.

class HttpOcrBackend(CustomOcrBackend):
    def __init__(self, url, auth):
        self.auth = auth
        self.url = url
    def ocr(self, image: bytes):
        r = requests.post(url, headers={"X-Auth": self.auth},
                                                    data=image)
        return r.json()

Then set the OCR recognition backend service to the custom service class.

d.setup_ocr_backend(HttpOcrBackend, "http://server/ocr", "Secret")

OCR Selector¶

Currently, OCR recognition selectors only support the following types.

text¶

Match complete text.

element = d.ocr(text="Mine")

textContains¶

Text contains match.

element = d.ocr(textContains="Mine")

textMatches¶

Text regular expression match.

element = d.ocr(textMatches=".*?Mine")

OCR Operations¶

Currently, OCR recognition selectors only support the following related operations.

click¶

Click on the selected element.

element.click()

click_exists¶

Click on the selected element if it exists.

element.click_exists()

exists¶

Whether the element exists.

element = d.ocr(textMatches=".*?Mine")

screenshot¶

Take a screenshot of the matched element.

element.screenshot(100, ).save("element.png")

info¶

Get the matched OCR information.

element.info()

Tip

If OCR still can’t solve your problem, you can also try using the image feature matching interface for image matching.

Image Matching Operations

Interface Locking