Anatomy of a Bug
Monday, June 6, 2011 at 9:40PM The Bug
Recently we had one of those bugs. Any developer who’s been around long enough knows what I mean. It’s the kind of bug that skitters across the freshly waxed floor just before opening night. Then it vanishes and you wonder if you really saw it at all, or if it’s simply a caffeine and sleep deprivation-induced hallucination. A manifestation of the anxiety that always accompanies you when you’re getting ready to ship:
What if we missed something?
What if we missed something really, really bad?
We had just finished adding some “Awesome Sauce” to Warheads for Android—shards, more explosive explosions, fireballs, a cool surface fog effect, and a planet (Mars in this case). It looked g-r-e-a-t and it played really well due to some hard work we had put into re-writing the shard rendering code. We’d tested the heck out of it and we were sure we’d gotten everything.
I did the final builds—the ones we upload to the Amazon App Store and Android Market—and did one last play just to marvel at the work of greatness.
The game froze.
At first I thought for sure it was just some cosmic ray striking at just the right moment to flip a bit and lock things up. So I tried again, and it played great. In fact, I played 17 waves and felt pretty good that my original assessment held up. It was just a fluke.
Then it froze again—on Wave 18—and I knew we had a problem.
The Hunt
This wasn’t your run-of-the-mill null pointer exception. The only thing we knew for sure was that this freeze had never been reported with the published version of Warheads, and that we could only get it to happen with the new “Awesome Sauce” version. To make matters worse, when the game froze the only way to unfreeze it on our Nexus One devices was to pull out the battery for a hard reset. It only happened rarely and when it did happen, all we got was a cryptic OS log error:
waitForCondition(DequeueCondition) timed out(identity=6, status=0). CPU may be pegged.
One of the first things we did was search for that message. We didn’t get any stack overflow-style “here’s your answer” results, but we did see that other people had encountered this issue that it could be something that only occurred on HTC devices. Since we had just rewritten the shards drawing code, we suspected that immediately. However, we soon discovered that the bug could be reproduced even with the shards rendering turned off. So it had to be something else.
We began to mistrust the OpenFeint code because the freeze seemed to loosely correlate to sending a high score to the OF service. We also knew that the problem was that the gl rendering thread was deadlocked so we thought maybe the OF thread was somehow interacting in a bad way. In the end, though, we were able to repro the problem with OF disabled. So it had to be something else.
This took a couple days, and by this point we were really sick of playing the game over and over again to make the bug happen. So we decided to have the game play against itself. This was something we had done in Zombie Armageddon and it was a huge help because, while Reabs and I were both pretty good at the game there’s only so much you can stand before you start to get a little demoralized. The game could play itself for hours and never get bored.
After about 4 days of poking and prodding and hoping that we had found the issue only to have our hopes dashed again, we gave up. We had spent close to 35 hours working the problem and could not find the solution. The update had other features and bug fixes we wanted to get out there so I finally made the call to ship it (version 2.4.0) without the Awesome Sauce.
I was really, really disappointed by this because I knew people would enjoy all the extras we had added in to the paid version. But at some point you just need to walk away. We shipped Warheads 2.4.0 without the Awesome Sauce. No one but us even knew what could have been in that release (until now).
The Kill
“I have a hard time letting go of problems that vex me.”
That’s what I wrote in my email to Reabs which included a build of Warheads that I sent literally moments before I dashed off to the airport for a Memorial Day weekend trip to California. As with many vexing problems I’ve encountered in the past, I solved this one while sleeping. Really.
I can’t really say exactly how the epiphany came about, but when I awoke in the morning I was inspired to look at the various textures we were using for the Awesome Sauce to see if there was something unusual about any of them (too big, for example). Others who had encountered this particular problem had commented that fiddling around with textures seemed to make the problem go away. I also recalled that different Android devices supported different texture formats.
I inspected the new textures we used and there it was—one texture was not like the rest. One texture, the one for the “fog” on the planet surface, was not square.
I know. OMFG! Not square!
You wouldn’t think that would cause 35+ hours of headache and to be honest, it shouldn’t have mattered at all. It’s a bug in the platform to be sure. But once I removed that one asset with the non-square texture, everything worked great. The game happily played itself well past level 70 with no freeze.
At this point we had another choice—try to resolve the problem and save the for feature, or just punt. We chose to give up on the fog and launch everything else. It’s really too bad and this is one of the downsides to the Android ecosystem—we believe this particular issue only happens on certain devices, but we don’t want to exclude those devices from our potential market and we don’t want to have to support multiple versions of the app. In this case, the deficiency of a small group of devices has caused everyone else to miss out.
This is an image of what it should look like (with fog). Still, it’s pretty awesome even without the fog.
You can see the latest version of Warheads on the Android Market, complete with Awesome Sauce, here. It’s a free trial, so go check it out!
James |
3 Comments |
Android,
Open GL,
bugs,
indie games,
warheads 