Text Recognition Operations¶
This section introduces how to use OCR (Optical Character Recognition) to assist with interface operations. In scenarios such as games or certain applications, conventional UI selectors may not be available. In such cases, you can use OCR-based methods for interaction. The OCR recognition method currently supports only operations like checking element existence, clicking, screenshotting, etc. Supported OCR backend libraries include paddleocr, easyocr, and custom HTTP backend interfaces.
Setting the OCR Backend¶
Before using OCR recognition, you must first configure the OCR backend. You are required to install the necessary dependencies in advance.
Attention
For cluster environments—i.e., controlling multiple devices from a single computer—it is strongly recommended to encapsulate OCR into an HTTP service. Directly using paddleocr or easyocr will consume significant local memory and computational resources, as each process would independently load the model.
Use paddleocr as the backend, with screenshot quality set to 80 and GPU acceleration enabled:
d.setup_ocr_backend("paddleocr", quality=80, use_gpu=True, drop_score=0.85, use_space_char=True)
Use easyocr as the backend, recognizing Simplified Chinese and English, with screenshot quality set to 80:
d.setup_ocr_backend("easyocr", ["ch_sim", "en"], quality=80)
Additional arguments in setup_ocr_backend correspond to initialization parameters of the respective OCR instance. If you’re unsure about how these parameters are specified, refer to the official instantiation examples below and compare them with the above usage:
paddleocr.PaddleOCR(use_gpu=True, drop_score=0.85, use_space_char=True)
easyocr.Reader(["ch_sim", "en"])
Custom OCR backend: This is primarily useful when managing many devices or when no local GPU acceleration is available. You can deploy your OCR model as an HTTP service and perform remote OCR recognition by making requests through the custom backend. You need to implement your own class inheriting from CustomOcrBackend and format the recognition results according to the required structure. You can also find the response format definition in our provided paddle_ocr_http_backend.py file, which you can slightly modify and deploy directly as a service.
class HttpOcrBackend(CustomOcrBackend):
def __init__(self, url, auth):
self.auth = auth
self.url = url
def ocr(self, image: bytes):
r = requests.post(self.url, headers={"X-Auth": self.auth},
data=image)
return r.json()
Then set the OCR backend to your custom service class:
d.setup_ocr_backend(HttpOcrBackend, "http://server/ocr", "Secret")
OCR Selectors¶
Currently, OCR selectors support only the following types:
text¶
Matches the complete text string.
element = d.ocr(text="我的")
textContains¶
Matches if the text contains the specified substring.
element = d.ocr(textContains="我的")
textMatches¶
Matches text using regular expressions.
element = d.ocr(textMatches=".*?我的")
OCR Operations¶
Currently, the following operations are supported for OCR-based selectors:
click¶
Clicks on the matched element.
element.click()
click_exists¶
Clicks on the matched element if it exists.
element.click_exists()
exists¶
Checks whether the element exists.
element = d.ocr(textMatches=".*?我的")
screenshot¶
Takes a screenshot of the matched element.
element.screenshot(100).save("element.png")
info¶
Retrieves the matched OCR information.
element.info()
Tip
If OCR still fails to address your needs, consider trying image feature matching APIs for image-based recognition.