Basic Knowledge¶
This chapter introduces you to the basic knowledge related to Android automation. Please be sure to read through this chapter’s descriptions as they won’t be repeated later. Android automation differs significantly from conventional web automation but also shares many commonalities. In conventional web automation, you can easily view the webpage layout and element IDs through the F12 developer tools, and then retrieve elements for operations like clicking and waiting via XPATH. Similarly in Android, you can use something called selectors to identify elements and perform actions like clicking and condition checking, so you shouldn’t worry about it being difficult to learn.
Similarities and Differences Between Mobile and Web Automation¶
Mobile automation has many similarities with web automation, but also many differences. Taking selenium as an example, typically, to control a webpage using selenium, you need three things: a browser, a webdriver, and selenium. Similarly, for mobile, the mobile phone is equivalent to the browser, FIRERPA is equivalent to the webdriver, and FIRERPA’s Python library, lambda, is equivalent to selenium. They share the same goal of executing simulated user operations to implement testing, data collection, or task automation. Both are driven by scripts and perform operations like locating elements, clicking, taking screenshots, and making judgments. From this perspective, they are quite similar.
But they are also different. First, mobile automation requires you to have both a mobile phone and a computer, while web automation can be done just on your own computer. Additionally, they don’t use the same set of tools. For web, common tools include Selenium, Puppeteer, and Playwright. For mobile, the common tools are FIRERPA, AutoJS, Appium, uiautomator, etc.
For web, common locating methods are mainly based on HTML DOM structure like xpath or csspath, with relatively intuitive element hierarchy. For mobile, common locating methods are selectors, though Android application interfaces also use XML layouts, allowing you to use xpath for selection. Typically, web automation doesn’t need to consider too many compatibility issues - most can be bypassed by fixing the browser and launch resolution size. But for Android, due to the differences in various brands and models of devices, screen sizes, system versions, etc., may affect the compatibility of automation code. But don’t be afraid - though there is an impact, it’s not significant.
Differences Between Various Automation Tools¶
As mentioned above, there are many differences between the common Android automation tools we listed. First, let’s clarify our position: FIRERPA is the most stable, feature-complete, powerful, and suitable for project management and application among all automation tools.
Note
Our position is not biased but formed through 6 years of continuous exploration and optimization. We’ve basically gone through the routes and pitfalls you might encounter.
AutoJS¶
Commonly used AutoJS and related derivative products are self-controlling types that require installing an APK on the device and writing scripts in JS for operation. Typically, AutoJS can only perform automation-level operations. The advantage is that it’s suitable for beginners or amateur use with a low entry barrier. However, its design is not suitable for large-scale script control, management, and updates. It’s in a decentralized, unmanaged state and cannot perform precise large-scale control.
Appium¶
Appium, commonly used by testers, belongs to the C/S architecture and is relatively more suitable for cluster control compared to AutoJS. However, it has obvious drawbacks. Since it’s designed for automation of most systems, supporting not only Android but also iOS, it’s bulky and bloated, very unsuitable for large-scale deployment and use.
u2¶
Finally, uiautomator2 also belongs to the C/S architecture. Compared to Appium, it’s more precise and sufficiently streamlined without too many extras, with just the right amount of functionality. But why did we abandon it? The main reason is that it’s not very stable in multi-device situations. Secondly, the automatic installation logic is suitable for beginners but seems redundant and difficult to control for professional cluster control, and it’s not maintained.
Of course, they are all very suitable for regular use. But we specifically don’t follow the regular path because in business, it’s usually impossible to just do automation. For example, if there’s a task to test an APP and record requests, responses, request times, etc., think about how you would do it. I believe your solution would either involve a lot of additional manual operations or be extremely unstable or incompatible. In the world of FIRERPA, you can perform all operations using code. You only interact with code, while stability and compatibility are handled by FIRERPA. Actually, we shouldn’t compare with these because functionally, FIRERPA is a superset of all the above solutions. It includes all the pitfalls we’ve encountered and paths we’ve taken.
Basic Automation Process¶
Usually, you need to pre-research the solution: whether to just perform regular automation or to capture application running data while automating. Typically, you have two approaches for data capture: one is through a man-in-the-middle for HTTP/s communication interception, and the other is through Hook for data interception. The man-in-the-middle approach is simpler and suitable for regular use, but may not work for some applications. The Hook approach requires a lot of reverse engineering knowledge, is difficult to get started with, and is not suitable for beginners, but is appropriate for edge cases.
Man-in-the-Middle Capture¶
The man-in-the-middle approach is relatively simple. You just need to find the content in the documentation about installing certificates and setting up proxies, and use it with mitmproxy to implement. If you’re completely unclear, you can also refer to our official startmitm.py script, which has all the logic written for you that you can copy or reuse at any time.
Hook Capture¶
The Hook approach requires you to have at least entry-level reverse engineering abilities. If you haven’t heard of it before, you don’t need to consider this approach for now. In short, the Hook approach involves writing frida scripts to hook related function calls, getting parameters and return values, submitting them, and injecting applications. You can learn about simple demos and usage methods in the “Using Frida to Report Data” section.
Automation Code¶
Of course, automation code is also an indispensable part because you need automation to trigger relevant logic. For writing automation code, you should typically follow this process. First, you should open FIREPRA’s remote desktop. You should see the following interface.
Now, please open the APP you want to automate, then click on the small eye icon in the top right corner of the remote desktop. You will see the following interface. At this point, select the element you want to operate on and click to view the element information.
Tip
Of course, you can also open it with code. These will be written about later.
You can see the element information on the right, such as text, resourceId, etc. Now we want to click on this element, so we write the following code, which means “click on the element with text ‘Agree’”.
d(text="Agree").click()
Note
This is just an example. There are many ways to write selectors. The example only introduces the simplest writing method.
Well, you’ve already understood the simplest way of writing. Now, please write control logic like if-else, along with other interfaces like exists, and you can implement a complete automation operation process. See, it’s not that hard.
Interface Layout Inspection¶
Normally, writing automation code is inseparable from interface layout inspection. This is also the only way for you to get selector conditions. First, you need to open the device’s remote desktop in a browser. Then click on the eye icon in the top right corner of the remote desktop to enter layout inspection. At this point, you can click on the dotted boxes on the left screen to view the corresponding element information. You can use the properties as parameters for the selector. Clicking the eye icon again will close the layout inspection. The layout inspection does not refresh with changes to the page; it always shows the screen layout at the moment you pressed the shortcut key. If you need to refresh the layout, please manually press the shortcut key CTRL + R
.
Hint
You can also press the TAB key in the layout inspection interface to browse through all elements.
Interface Selector¶
Interface selector is used to operate Android elements. You can also understand it as Xpath rules. Although different, their general purposes are the same. In FIRERPA, the selector is Selector
. In most cases, you don’t need to directly touch this class. You should have seen this in the text above. It includes the following optional parameters.
Match Type | Description |
---|---|
text | Exact text match |
textContains | Text contains match |
textStartsWith | Text starts with match |
className | Class name match |
description | Exact description match |
descriptionContains | Description contains match |
descriptionStartsWith | Description starts with match |
clickable | Can be clicked |
longClickable | Can be long-pressed |
scrollable | Can be scrolled |
resourceId | Resource ID match |
In most cases, only resourceId, text, description, textContains, etc. are used as parameters. If the element has a normal resourceId, use it as a priority for the Selector, such as d(resourceId="com.xxx:id/mobile_signal")
. Otherwise, use text, such as d(text="Click to enter")
, or more vaguely d(textContains="Click")
. Description works similarly to text but is used less frequently.
Hint
The Selector is constructed from the main parameters you obtain through the interface layout inspection function described above.
Screen Coordinate Definition¶
During automation operations, you may inevitably encounter situations where you need to operate through detailed coordinates or regional coordinates. But since many people may not be clear about coordinate issues, we’ll introduce Android screen coordinate knowledge here.
As we all know, images have resolution sizes, and so do screens. For Android screens, regardless of whether they are in landscape, portrait, or auto-rotating mode, the top-left corner is uniformly used as the origin (0,0), and the coordinate system extends to the right and down, with X as the horizontal axis and Y as the vertical axis, as shown in the figure.
From the above figure, we know that the coordinates of the top-left corner of the screen are 0,0, the top-right corner is 1080,0, the bottom-left corner is b
0,1920, and the bottom-right corner is 1080,1920. You can use this information to calculate the coordinates of any point on the screen.
Note
Regardless of whether the screen is originally portrait, landscape, or auto-rotating, the top-left corner of the current screen direction is uniformly used as the origin.
Points on the Screen¶
In FIRERPA, there are two definitions about the screen. Some operations like clicking or screenshot require you to provide region or coordinate information. For common coordinate points, we use the following definition, which represents a point with screen coordinates of 100,100.
Point(x=100, y=100)
Definition of Region¶
The definition of a region is a rectangular area on the screen. Its definition is a bit complex, so please read carefully. We use Bound to represent an area on the screen, which requires you to provide four parameters: top
, left
, bottom
, and right
. You might be a bit confused, so please understand carefully: top
represents the pixel distance from the top of the rectangle to the top of the screen, left
represents the pixel distance from the left side of the rectangle to the left side of the screen, right
represents the distance from the right side of the rectangle to the left side of the screen, and bottom
represents the distance from the bottom of the rectangle to the top of the screen. In short, you can understand all distances as distances from the XY axis radiating from the origin. Below, we use a figure to assist your understanding. The screen of the phone is still 1080x1920, and the device is currently in portrait mode.
Now let’s assume the screen is divided into four equal parts, and we need to get the definitions of the top-left and bottom-right regions as shown in the figure. According to the rules, for region 1, the top of the rectangle is 0 pixels from the top of the screen, the left side of the rectangle is 0 pixels from the left side of the screen, the bottom of the rectangle is 960 pixels from the top of the screen (1920÷2), and the right side of the rectangle is 540 pixels from the left side of the screen (1080÷2), so its definition should be:
Bound(top=0, left=0, right=540, bottom=960)
Similarly, for region 2, the top of the rectangle is 960 pixels from the top of the screen, the left side of the rectangle is 540 pixels from the left side of the screen, the right side of the rectangle is 1080 pixels from the left side of the screen, and the top of the rectangle is 1920 pixels from the top of the screen. So the definition of the second rectangle is:
Bound(top=960, left=540, right=1080, bottom=1920)
Android Application Data¶
Each Android application has a dedicated directory on the device for storing application data. Usually, the relevant data of the application is stored in the /data directory. You can get the application’s data directory by calling the d.application(“com.example”).info() interface. In most cases, you can also directly cd to the directory /data/user/0/com.example.test to go directly to the user directory. In addition to the regular /data directory, some applications also store multimedia and other files in the /sdcard/Android directory.
View SMS Database¶
Sometimes, you might want to see where the SMS received by your device are stored. Great idea, and it’s very simple. By writing an extension, you can even read the content directly and get it in real-time via an HTTP interface!
Let’s follow the regular Android approach. If yours is different, please think creatively. On Android, the name of the SMS application should be com.android.mms, so we can directly switch to the directory /data/user/0/com.android.mms. Through our operations below, you can see that there are several databases in the databases directory, and mmssms.db is our target.
λ 10:12 /data/user/0/com.android.mms ➥ ls -la
total 82
drwx------ 7 u0_a78 u0_a78 3452 Jan 2 2021 .
drwxrwx--x 381 system system 53248 May 2 16:46 ..
drwxrws--x 3 u0_a78 u0_a78_c 3452 Jan 2 2021 cache
drwxrws--x 2 u0_a78 u0_a78_c 3452 Jan 2 2021 code_cache
drwxrwx--x 2 u0_a78 u0_a78 3452 Jan 2 2021 databases
drwxrwx--x 7 u0_a78 u0_a78 24576 Feb 26 13:43 files
drwxrwx--x 2 u0_a78 u0_a78 3452 May 4 10:12 shared_prefs
λ 10:12 /data/user/0/com.android.mms ➥ ls -l databases/
total 504
-rw-rw---- 1 u0_a78 u0_a78 24576 Jan 2 2021 dynamic_bubble
-rw------- 1 u0_a78 u0_a78 0 Jan 2 2021 dynamic_bubble-journal
-rw-rw---- 1 u0_a78 u0_a78 491520 Feb 27 04:18 mmssms.db
-rw------- 1 u0_a78 u0_a78 0 Jan 2 2021 mmssms.db-journal
λ 10:12 /data/user/0/com.android.mms ➥
Of course, reading is very simple because under Android, regular application databases are SQLite. However, some applications with higher security usually encrypt their own databases. Of course, FireRPA is so powerful that it not only can read regular SQLite but also supports real-time reading of WeChat (sqlcipher) aes-256, Enterprise WeChat aes-128, Ali system sqlcrypto (aes-128), and various other types of databases (provided that you need to find the key yourself). Below, we’ll briefly demonstrate how to read the content of the system’s SMS. It’s very simple, just one command. Of course, you can also write an extension to read it.
sqlite3 databases/mmssms.db .dump
Of course, the output is a lot, but you can quickly find the table where the needed data is located and then use SQL yourself. This method applies to 98% of Android applications; the remaining 2% are encrypted databases.
View Encrypted Databases¶
For these encrypted databases, you need to find the database key or its generation method yourself. Below, we’ll briefly introduce how to read the databases of related software. We’ll only introduce how to use PRAGMA to preset keys. If you don’t understand what this is, please go learn about sqlite first.
WeChat system (sqlcipher)
PRAGMA cipher = "sqlcipher";
PRAGMA legacy = 1;
PRAGMA key = "database-key";
Enterprise WeChat (wxsqlite)
PRAGMA cipher = "aes128cbc";
PRAGMA hexkey = "database-key"
Ali system (sqlcrypto)
PRAGMA cipher = "sqlcrypto";
PRAGMA key = "database-key"
Hint
Android application databases don’t necessarily have to be in the databases directory.
View Other Data¶
Of course, the application data directory contains more than just databases. It also includes some application-related parameters, configurations, caches, files, such as shared_prefs (xml), etc. We won’t go into too much detail; you can explore on your own.
Automation Assistance Measures¶
In automation business, not all applications are suitable for positioning using selectors. Some interfaces, like games, are real-time rendered and don’t have Android-level page layouts. So for such applications, you can only operate and judge through OCR or image matching. We provide a complete OCR assistance solution as well as built-in image SIFT and template matching solutions to help implement these business goals. You can find relevant interfaces and their usage methods in the documentation.
Updating…