Posts Tagged ‘qemu’
“Your god Debian seeks to possess all running systems, and with them, gain deserved ascendance over the other gods.You, a newly trained Hacker, have been heralded from birth as the instrument of Debian. You are destined to rescue non-Debian systems for your deity, or die in the attempt. Your destiny has come. For the sake of us all: Put Debian on them all!”- The Book of Debian
I am but a lowly hacker, and a fanatic Debian user: whatever comes my way, will end up having Debian on it, no exceptions. From a lowly 386DX with 2mbs of ram (running, I think, Sarge), through my router and desktop, to my latest trophy: A PowerMac G4 Quicksilver. Now, none of these are particularly hard to install Debian onto (albeit the 386 gave me quite a bit of trouble, as it had neither a CDROM, nor any network, nor a working serial port, so Debian was installed using a pair of floppy disks and PLIP), but the latest one was quite a challenge. Normally, it’d be as easy as burning a CD, booting it, and off you go, but not this time.
This Mac had its CD replaced by a DVD reader, which OpenFirmware couldn’t use for some reason, whatever I put in it, be that a CD or a DVD, it couldn’t boot, no matter what. But that’s no problem, we can boot from the network! Or at least, so I thought. After a few hours of scratching my head over why it never gets past downloading yaboot over and over and over again, I gave up that idea aswell. But a mere machine will not win so easily over me! Oh no!
As a true hacker, I picked the mac apart, and put it’s hard drive into my PC, fired up qemu-ppc, properly configured so it saw the Mac HD as hda, the install iso as its CD, and bam. A couple of hours later (qemu-ppc is horribly slow), I had an emulated system up and running. Then I put the hard drive back to the Mac, turned it on, and voila:
algernon@galadriel:~$ uname -a Linux galadriel 2.6.32-5-powerpc #1 Wed Jan 12 04:47:03 UTC 2011 ppc GNU/Linux algernon@galadriel:~$ cat /proc/cpuinfo processor : 0 cpu : 7450, altivec supported clock : 799.999998MHz revision : 2.1 (pvr 8000 0201) bogomips : 66.43 timebase : 33217483 platform : PowerMac model : PowerMac3,5 machine : PowerMac3,5 motherboard : PowerMac3,5 MacRISC2 MacRISC Power Macintosh detected as : 69 (PowerMac G4 Silver) pmac flags : 00000010 L2 cache : 256K unified pmac-generation : NewWorld Memory : 1024 MB
No system shall escape the fate of being Debianized. Naturally, it’s running syslog-ng, logging to a central location just like all of the other little toys I had the luxury to play with.
As part of my afmongodb driver, I wrote a mongodb client library, and this time, I started to experiment with test-driven development. While there’s still room for improvement, as neither my test suite is complete enough, nor is my documentation at the level I want it to be, there’s a few lessons I learned in the process, and some of these, I’d like to share.
First, a reasonably complete test suite is a godsend, and I mean that. Even better is when one writes the tests first, along with documentation, and the code afterwards. There were a lot of bugs I could catch, because the test suite caught them: ranging from endianness bugs, through abusing va_list in ways it wasn’t meant to be used, to simple coding errors that would result in bad implementation of a spec. But a test suite alone is not what I want to talk about today, especially since the suite I wrote for libmongo-client has a lot to improve still. What I want to talk about is the importance of testing one’s code on multiple architectures.
During the past few days, I’ve been preparing for the first release of the library, and due to the nature of the MongoDB wire protocol (it’s Little Endian), I wanted to test it on a Big Endian system too , so off I went and installed Debian/PowerPC in QEMU, and ran the test suite there, which, to my suprise, revealed a couple of endianness-related bugs. Even though I went to great lengths to ensure my code is endian-safe, there were still a few cases where it wasn’t, and the test suite caught it, but only when ran on a very different system.
After the endianness bugs were hunted down and fixed, I had this another idea: what about testing on a little-endian, but non-x86 architecture? At first, I wanted to try mipsel, for various reasons (almost a decade ago, I had the pleasure of working with a mipsel system, and found it very neat at the time; a well written book about the architecture internals just emphasized that), but ran into a few issues, namely I would’ve needed a firmware, which I didn’t have at hand. So instead of hunting one down (the only mips hardware I have at home is a trusty old router with very limited firmware that does not have any easy remote access apart from http)., I opted to find another suitable architecture that QEMU can emulate, and ended up with armel.
Now that was another interesting experience: the installation went fine, there’s plenty of HOWTOs on the net, but the test suite revealed another kind of bug: one that was a lot harder to find and fix than endianness. This too, was found by the test suite, as there were no compiler warnings, nor anything, and the example applications worked perfectly aswell.
“What was the bug?” one may ask, and I’ll tell you: what I saw, is that I had a function that took a variable number of arguments, terminated by a zero-value. I used this to build BSON objects in cases where most of the contents are known at compile time. One of the test cases built the same BSON object using this API, and compared the result to building the same document with the traditional API. On x86, x86-64 and powerpc, this testcase ran perfectly, but on armel, it bailed out with errors. However, there were other functions in my code that took a variable number of arguments, but they worked just fine.
Digging into the testcase with GDB proved to be a surprising experience too, for a multitude of reasons, and prompted me to read up a bit on the ARM architecture (it’s interesting, by the way), but for a long time, I couldn’t figure out where the problem lies: my stack was just full of garbage the moment I entered the function, and it stayed that way. As it turns out, the problem was that I passed a va_list to another function, which used va_arg() on them. According to POSIX:
The object ap may be passed as an argument to another function; if that function invokes the va_arg() macro with parameter ap, the value of ap in the calling function is unspecified and shall be passed to the va_end() macro prior to any further reference to ap.
And that is exactly what I did, but I never read this line in the documentation before. Turns out, of the four architectures I compiled on, three worked the way I expected, and I could va_arg()ing after the called function returned. Not on arm, however. All I got back was garbage, and that’s what the test suite caught, but I needed an armel system to catch this.
The lesson learned?
However complete your test suite may be, you can still have code that works only by accident, so testing on different architectures can help a lot. And QEMU, along with all the tools built around it, is an awesome tool in a developers toolbox, that can greatly aid in writing correct and portable code. And on a Debian system, it’s very easy to set up a build environment, thanks to tools like sbuild and dput. I can now compile and test packages on four different architectures, without the need to figure out how buildbot works.