Saturday, May 11, 2013

Casual thoughts on Stagefright

For the past 4+ years, I've mainly focused on Android multimedia area in my daily work. While the maintenance of Stagefright is much easier compared with OpenCORE,, there are still places which I think could be improved in future. Although I will not have the privilege of closely working on it any more after moving to my next occupation :)

Main issue
One of the biggest pains while working with Stagefright is various ANRs and tombstones related to mediaserver during stability test or user trials.
On one hand, Stagefright isn't mature and robust enough, there are some inherent issues limited by its structure, which makes it very difficult to fix all of them thoroughly. For example, you can easily reproduce an ANR while seeking and play/pause quickly during HTTP streaming due to the blocking read() call to network data and no way to cancel it gracefully, and the existing buffering mechanism doesn't work well because it can't predict which position to be read after seeking; and the potential deadlock between mediaserver and mediaplayer due to the subtle difference between calling mediaplayer from another application and from mediaserver itself (e.g. CameraService) as the binder transaction is skipped from intra-process call, etc.
On the other hand, even if you find the root cause and a potential solution, it's difficult to validate the fix because of the problem is highly impacted by network condition, race condition among threads, etc. Although we can simulate network traffic conditions with bandwidth control tools like netem, it still requires fine tuning and many tries if we don't hack into the code to force it to enter the situation. For lock related issues, I used to write simple CppUTest cases in many loops with different sleep time between function calls and then wish myself luck.

As an advocate of TDD and test automation, I think it would be ideal to have native media engine code written with testability in mind, such as incorporating gtest or other C++ test frameworks, so we can easily add unit test cases for newly reported issues. It would also help to facilitate the up-merge process of each release. Based on my experience, CTS and mediaframeworktest is far from sufficient to catch most of the issues we encountered before shipping the device.

In addition, with more and more features added into AOSP, it would help to unify different pieces of multimedia module into a more flexible architecture, to provide a generic pipeline for playback, recording, and transcoding etc, as well as an elegant way to support customized audio features, like LPA, tunnel mode, 5.1 channels, etc.

I guess it sounds greedy given that most of its code was written by only a couple of Google engineers in a short time with more and more functionality been added to each release. Keep moving, Stagefright!


Gowtham said...
This comment has been removed by the author.
Gowtham said...

I have been following your post for years now. While we are struggling to fix some ANRs when network connection drops, now we know the reason. Good post.