Wednesday, June 1, 2011

Android Monkeyrunner and the Google ADB: a lament

Intro

So, for the past couple of months, I've been trying to get Android Monkeyrunner to cooperate for distributed automated testing, but it has been an uphill battle... against an entrenched army of monkeys armed with bazookas. I wanted the Monkeyrunner library to work well, but I get the feeling that Monkeyrunner has not been tested or used much.

The honeymoon

My experience with Monkeyrunner a month or two ago didn't start out all bad. The Monkeyrunner press function works much faster than doing "adb shell input keyevent" calls (likely due to a new shell being launched with every invocation and no option to chain together a long string of keyevents in the same session), and I got a glimpse of how easy smart phone automation could be without writing customized Java Unit tests or installing Robotium on the phone. I could just send KeyEvents to the phone, type a string, and even connect 2 to 8 phones to our servers and launch MonkeyRunner tests in parallel (more on problems with this later). With Monkeyrunner, I could instruct a non-technical person on how to write a test based simply on how they would use a directional pad and keyboard to navigate around the activity.

Aaaaaand we don't even cuddle anymore

The first problem with MonkeyRunner for me came in the form of the type function being broken when the space key is used. This is not unique to Monkeyrunner. It appears that adb shell input text suffers from a similar problem. There may be several other KeyEvents (other than spaces) that fall into this particular hazard, but I was able to get around the issue for now by removing spaces from the text to be sent and inserting KEYCODE_SPACE where appropriate.

There were a couple of other problems with MonkeyRunner that kept cropping up. First, there is very little support for debugging the state of the activity you are trying to instrument.

You can't even get information on whether or not the activity has crashed without going back to adb and logcat. You can't form KeyEvent pairings that select an entire EditText without long clicks, but long clicks are hard to emulate when the EditText could be in a different location on the screen due to portrait or landscape modes, or even because the screen resolution is different between two phones.

You can't press two buttons at once because the DOWN type in the press method is apparently mapped directly to DOWN_AND_UP. Basically, the shift is unpressed immediately after you get out of the press function, regardless of what you pass it. This caused some headaches when trying to select all text, but it was manageable. No automation killer problem found yet... until Tuesday...

Monkeyrunner is a racist... that's a software library that causes race conditions, right?

On Tuesday came the worst problem, which drove me to try to rewrite the Monkeyrunner library without modifying the Android Debug Bridge. There is a race condition in the MonkeyRunner WaitForConnection method that occurs when you try to wait for multiple phones at once (even from separate heavy weight processes). The only way to really witness this issue is when you have an automated system trying to launch activities on 2 to 8 phones at once (humans take milliseconds or seconds to launch each by hand, so the race condition is hard for a manual tester to catch). The WaitForConnection method will cause random behavior on one of the phones while opening the other one without a problem for a moment. Then the automation on all phones halts. The issue is very weird.

We got around this for a short term fix by ensuring that we always waited 1 second after the previous phone launched before starting its automation (via the KATS process life cycle). While this works, it is not ideal. We wanted to launch 2-8 phones at once per server (as many USB connections as we can do right now) and see if there were any race conditions involving the phones connecting or disseminating to the server. With this race condition in Monkeyrunner and our subsequent fix of trying to sleep in between each phone launch, it's likely that the phones will have 1 second of difference between sending, which means we can't test everything that we what we want to test.

What's most frustrating about this is that the problem is not on my end, and I can't seem to find any fix to this without modifying the Android code base.

Monkeyrunner withdrawal

To try to address the issue, I rewrote my entire Python scripting library which wrapped Monkeyrunner to instead use nothing but ADB under the hood. It started out promising. First, the adb equivalent of WaitForConnection was much, much faster (basically, I just used adb get-state. The WaitForConnection method must be establishing an actual session with the phone, and this is probably where the race condition is occurring (during the session creation, which is almost certainly not thread safe). So, far so good. Actually, the entire library was a breeze to write.

Then I run it... and the adb shell input keyevent command inserts those 1 second delays in between every KEYCODE_DPAD_LEFT, backspace, menu, etc. A 15 second MonkeyRunner test is extended to hundreds of seconds when using adb shell input keyevent. The culprit with the adb shell is probably that a separate shell session is started with each invocation - rather than queuing the events to the target phone and returning immediately. I can understand this not being the default behavior, but I can't really understand why an asynchronous or a queuing version isn't available.

A lamentable conclusion

Being able to send KeyEvents to an Android phone is pretty awesome. I hope that the Google folks either fix the race conditions in MonkeyRunner or they fix the delays in adb shell so we can send KeyEvents at decent speeds. For the moment though, this is my Monkeyrunner sad face :(

Library files

The wrappers around the Monkeyrunner and ADB interfaces are linked below. The library is called the MADARA Android Monkeyrunner Library (MAML).

MAML sans Monkeyrunner
MAML original