Basic Knowledge¶
This chapter introduces the basic knowledge related to Android automation. Please be sure to read the descriptions in this chapter, as they will not be repeated later. Android automation is very different from conventional web automation, but they also share many similarities. In conventional web automation, you can easily view the webpage layout and element IDs using the F12 developer tools, and then use XPath to get elements for operations like clicking and waiting. The logic in Android is similar; you can also use something called a selector to select elements and perform operations like clicking and asserting. Therefore, you don't need to worry about a steep learning curve.
Similarities and Differences Between Mobile and Web Automation¶
Mobile automation and web automation have many similarities, but also many differences. Let's take Selenium as an example. Typically, to control a webpage with Selenium, you need three things: a browser, a WebDriver, and Selenium itself. Of course, it's the same for mobile. A mobile phone is equivalent to a browser, FIRERPA is equivalent to WebDriver, and FIRERPA's Python library, lamda, is equivalent to Selenium. Their goals are also the same: to simulate user actions to achieve testing, data collection, or automated task execution. Both are script-driven and involve operations like locating elements, clicking, taking screenshots, and making assertions. Looked at this way, they are quite similar.
However, they are also different. First, mobile automation requires you to have a mobile phone and a computer, whereas web automation can be done on your own computer. They also do not use the same set of solutions. For the web, common tools include Selenium, Puppeteer, and Playwright. For mobile, common tools are FIRERPA, AutoJS, Appium, and uiautomator.
For the web, common locating methods are mainly based on the HTML DOM structure, such as XPath or CSS paths, and the element hierarchy is relatively intuitive. For mobile, the common locating method is the selector. Of course, Android application interfaces also use XML layouts, so you can also use XPath with XML. Generally, web automation doesn't need to consider too many compatibility issues; in most cases, you can bypass most device-related compatibility problems by fixing the browser and setting the startup resolution. But for Android, due to the variety of device brands and models, factors like screen size and system version can all affect the compatibility of the automation code. But please don't be afraid; while there is an impact, it is not significant.
Differences Between Various Automation Tools¶
As mentioned above, there are also many differences between the common Android automation tools we listed. Let's first state our position: FIRERPA is the most stable, most feature-complete, most powerful, and most suitable tool for project-based management and application among all automation tools.
Note
AutoJS¶
Commonly used tools like AutoJS and its derivatives are of the self-controlling type. They require installing an APK on the device and writing scripts in JS to operate. Typically, AutoJS can only perform basic automation-level operations. Its advantage is that it's suitable for beginners or amateur use, with a low entry barrier. However, its design is not suitable for large-scale script control, management, and updates. It is decentralized and unmanaged, making precise, large-scale control impossible.
Appium¶
Appium, commonly used by testers, has a C/S (Client/Server) architecture. It is relatively more suitable for cluster control than AutoJS. However, it has obvious disadvantages. Since it is applicable to automation on most systems, supporting iOS in addition to Android, it is large and bloated, making it very unsuitable for large-scale deployment.
u2¶
Finally, uiautomator2, also a C/S architecture, is more precise than Appium and has been streamlined to a sufficient degree, without too many redundant features, and its functions are just right. But why did we abandon it? The main reason is its instability in multi-device scenarios. Secondly, its automatic installation logic is suitable for beginners but is redundant and difficult to control for professional cluster management. Furthermore, it is no longer maintained.
Of course, they are all very suitable for regular use. But we happen to be taking a less conventional path, because in business, it's usually impossible to only perform automation. For example, if there's a task to test an app and you need to record requests, responses, request times, etc. Think about how you would do it. I believe your solution would either involve a lot of extra manual operations or be extremely unstable or incompatible. But in the world of FIRERPA, you can perform all operations using code. All you interact with is code; stability and compatibility are handled by FIRERPA. In fact, we shouldn't be compared with these tools, because in terms of functionality, FIRERPA is a superset of all the above solutions. It incorporates all the pitfalls we've encountered and the paths we've taken.
Basic Automation Workflow¶
Usually, you need to conduct preliminary research on your solution: will you only perform conventional automation, or do you need to extract application runtime data while automating? Typically, you have two options for data extraction. The first is to intercept HTTP/s communication through a man-in-the-middle (MITM). The second is to intercept data through Hooking. The MITM method is simpler and suitable for regular use, but it may not work for some applications. The Hooking solution requires a great deal of reverse engineering knowledge, is difficult for beginners, and is suitable for use in edge cases.
Man-in-the-Middle Interception¶
The MITM solution is relatively simple. You just need to find the relevant content in the documentation about installing an MITM certificate and setting up a proxy, and you can implement it with mitmproxy. If you are completely unfamiliar with this, you can refer to our official startmitm.py script, which has all the logic written for you, ready to be copied or reused at any time.
Hooking Interception¶
The Hooking solution requires you to have at least a beginner's level of reverse engineering skills. If you have never heard of it before, you don't need to consider this solution for now. In general, the Hooking solution involves writing Frida scripts to hook relevant function calls, get arguments or return values and submit them, and inject them into the application. You can find a simple demo and usage instructions in the "Using Frida to Report Data" chapter.
Automation Code¶
Of course, automation code is also an indispensable part, as you need it to trigger the relevant logic. For writing automation code, your workflow should generally be as follows. First, you should open FIRERPA's remote desktop. You should see an interface like the one below.

Now, please open the app you want to automate, then click the small eye icon in the upper right corner of the remote desktop. You will see the following interface. At this point, select the element you want to operate on and click it to view its information.
Tip

You can see the element information on the right, such as text, resourceId, etc. Now, if we want to click this element, we write the following code. The meaning of this code is "click the element whose text is '同意' (Agree)".
d(text="同意").click()
Note
Alright, you have now learned the simplest way. Now, by writing control logic like if-else and using other interfaces like exists, you can implement a complete automation workflow. See? It's not that difficult.
UI Layout Inspection¶
Normally, writing automation code is inseparable from UI layout inspection. This is also the only way for you to obtain the conditions for your selectors. First, you need to open the device's remote desktop in your browser. Then, click the eye icon in the upper right corner of the remote desktop to enter layout inspection mode. At this point, you can click on the dashed boxes on the left screen to view the information of the corresponding elements. You can use their properties as parameters for your selectors. Clicking the eye icon again will close the layout inspection. The layout inspection does not refresh as the page changes; it always shows the screen layout at the moment you pressed the hotkey. To refresh the layout, please manually press the hotkey CTRL + R.

Hint
UI Selector¶
The UI Selector is used to operate on Android elements. You can think of it as being similar to XPath rules; although they are different, their general purpose is the same. In FIRERPA, the selector is Selector. In most cases, you won't need to interact with this class directly. You should have seen it in the text above. In its complete form, it includes the following optional parameters.
| Match Type | Description |
|---|---|
| text | Exact text match |
| textContains | Text contains match |
| textStartsWith | Text starts with match |
| className | Class name match |
| description | Exact description match |
| descriptionContains | Description contains match |
| descriptionStartsWith | Description starts with match |
| clickable | Clickable |
| longClickable | Long-clickable |
| scrollable | Scrollable |
| resourceId | Resource ID match |
In most cases, only resourceId, text, description, and textContains are used as parameters. If an element has a proper resourceId, it should be prioritized as the Selector, like d(resourceId="com.xxx:id/mobile_signal"). Otherwise, text will be used, like d(text="点击进入"), or a more fuzzy match like d(textContains="点击"). description is similar to text, but it is used less frequently.
Hint
Screen Coordinate Definition¶
During automation, it's inevitable to encounter situations where you need to operate based on specific coordinates or regional coordinates. However, since many people may not be very clear about coordinate systems, we will introduce the basics of Android screen coordinates here.
As we all know, images have resolution sizes, and so do screens. For an Android screen, regardless of whether it's in landscape, portrait, or auto-rotating mode, the top-left corner is uniformly treated as the origin (0,0) in a coordinate system that extends to the right and down. X is the horizontal axis, and Y is the vertical axis, as shown in the figure.
As you can see from the figure above, the coordinates of the top-left corner of the screen are (0,0), the top-right corner is (1080,0), the bottom-left is (0,1920), and the bottom-right is (1080,1920). You can use this information to calculate the coordinates of any point on the screen.
Note
Points on the Screen¶
In FIRERPA, there are two definitions related to the screen. Some operations, like clicking or taking a screenshot, require you to provide region or coordinate information. For a common coordinate point, we use the following definition, which represents the point at screen coordinates (100,100).
Point(x=100, y=100)
Defining a Region¶
A region is defined as a rectangular area on the screen. Its definition is slightly more complex, so please read carefully. We use Bound to represent an area on the screen. It requires you to provide four parameters: top, left, bottom, and right. You might be a bit confused, so please understand the following carefully: top represents the pixel distance from the top edge of the rectangle to the top edge of the screen, left represents the pixel distance from the left edge of the rectangle to the left edge of the screen, right represents the distance from the right edge of the rectangle to the left edge of the screen, and bottom represents the distance from the bottom edge of the rectangle to the top edge of the screen. In short, you can understand that all distances are relative to the X and Y axes radiating from the origin. Below is a diagram to help you understand. The phone's screen is still 1080x1920, and the device is currently in portrait mode.
Now, let's assume the screen is divided into four equal parts, and we need to define the top-left and bottom-right regions as shown in the figure. According to the rules, for Region 1, the distance from the rectangle's top edge to the screen's top is 0 pixels, the distance from the left edge to the screen's left is 0 pixels, the distance from the bottom edge to the screen's top is 960 pixels (1920÷2), and the distance from the right edge to the screen's left is 540 pixels (1080÷2). So its definition should be:
Bound(top=0, left=0, right=540, bottom=960)
Similarly, for Region 2, the distance from the rectangle's top edge to the screen's top is 960 pixels, the distance from the left edge to the screen's left is 540 pixels, the distance from the right edge to the screen's left is 1080 pixels, and the distance from the bottom edge to the screen's top is 1920 pixels. Thus, the definition for the second rectangle is:
Bound(top=960, left=540, right=1080, bottom=1920)
Android Application Data¶
Each Android application has a dedicated directory on the device to store its data. Typically, an application's data is stored in the /data directory. You can get the application's data directory by calling the d.application("com.example").info() interface. In most cases, you can also directly cd to /data/user/0/com.example.test to go to the user directory. In addition to the standard /data directory, some applications also store multimedia and other files in the /sdcard/Android directory.
Viewing the SMS Database¶
Sometimes, you might want to see where the SMS messages received on the device are stored. This is an excellent idea, and it's very simple. You can even write an extension to read the content directly and fetch it in real-time via an HTTP interface!
Let's follow the standard Android approach. If your setup is different, please adapt your thinking. On Android, the package name for the SMS application should be com.android.mms, so we can directly switch to the /data/user/0/com.android.mms directory. Through the operations below, you can see several databases in the databases directory. Here, mmssms.db is our target.
λ 10:12 /data/user/0/com.android.mms ➥ ls -la
total 82
drwx------ 7 u0_a78 u0_a78 3452 Jan 2 2021 .
drwxrwx--x 381 system system 53248 May 2 16:46 ..
drwxrws--x 3 u0_a78 u0_a78_c 3452 Jan 2 2021 cache
drwxrws--x 2 u0_a78 u0_a78_c 3452 Jan 2 2021 code_cache
drwxrwx--x 2 u0_a78 u0_a78 3452 Jan 2 2021 databases
drwxrwx--x 7 u0_a78 u0_a78 24576 Feb 26 13:43 files
drwxrwx--x 2 u0_a78 u0_a78 3452 May 4 10:12 shared_prefs
λ 10:12 /data/user/0/com.android.mms ➥ ls -l databases/
total 504
-rw-rw---- 1 u0_a78 u0_a78 24576 Jan 2 2021 dynamic_bubble
-rw------- 1 u0_a78 u0_a78 0 Jan 2 2021 dynamic_bubble-journal
-rw-rw---- 1 u0_a78 u0_a78 491520 Feb 27 04:18 mmssms.db
-rw------- 1 u0_a78 u0_a78 0 Jan 2 2021 mmssms.db-journal
λ 10:12 /data/user/0/com.android.mms ➥
Of course, reading it is very simple, because standard application databases on Android are SQLite. However, some high-security applications often encrypt their own databases. But of course, a tool as powerful as FireRPA would have a solution for that. FireRPA can not only read standard SQLite but also supports real-time reading of various types of databases like WeChat (sqlcipher) aes-256, WeChat Work (wxsqlite) aes-128, and Ali-series (sqlcrypto) aes-128 (provided you find the key yourself). Below, we will briefly demonstrate how to read the system's SMS content. It's very simple, just one command. Of course, you can also write an extension to read it.
sqlite3 databases/mmssms.db .dump
Of course, the output will be a large dump, but you can quickly find the table containing the data you need and then write your own SQL queries. This method works for 98% of Android applications; the remaining 2% are encrypted databases.
Viewing Encrypted Databases¶
For these encrypted databases, you need to find the database key or the method used to generate it. Below is a brief introduction on how to read the databases of related software. We will only introduce how to preset the key using PRAGMA. If you don't understand what this is, please learn about SQLite first.
WeChat series (sqlcipher)
PRAGMA cipher = "sqlcipher";
PRAGMA legacy = 1;
PRAGMA key = "database-key";
WeChat Work (wxsqlite)
PRAGMA cipher = "aes128cbc";
PRAGMA hexkey = "database-key"
Ali-series (sqlcrypto)
PRAGMA cipher = "sqlcrypto";
PRAGMA key = "database-key"
Hint
Viewing Other Data¶
Of course, the application data directory contains more than just databases. It also includes application-related parameters, configurations, caches, and files, such as shared_prefs (XML). However, we won't go into too much detail; you can explore this on your own.
Auxiliary Automation Measures¶
In automation tasks, not all applications are suitable for locating elements using selectors. Some interfaces, like games, are rendered in real-time and do not have an Android-level page layout. Therefore, for such applications, you can only use OCR or image matching for operational decisions. We provide a complete OCR auxiliary solution and built-in image SIFT and template matching solutions to help achieve these business goals. You can find the relevant interfaces and their usage methods in the documentation.
Updating...