Reimplementing a subset of DD in Python

Through my work on imaging our Nokia test farm, I have developed 3 approaches to imaging the n810.  The first is to set up an N810 then generate our own firmware image as a JFFS2 filesystem.  This approach gives us an N810 that is essentially factory-stock.  We found that we were still having devices fall over on a regular basis with this approach.  The next approach I tried was to put the full Maemo operating system on an SD card and boot from it.  This resulted in significantly improved reliability at a minor cost in test suite run time.  The actual imaging process is far less human involved than the older approach. All that is required is that somebody is there to change blank sd cards and execute the command again.  This process used rsync (sudo rsync -a moz-ref-v2/. /mount/point/.) to copy the files from a directory on the imaging machine onto the SD card.  As a part of the imaging process, the hostname is set as are a couple other bits of information.  It took me the better part of 3 days to image all 40 sd cards using this approach.

The third approach that we are going to move forward with uses the imaging process of the second approach to create a ‘master’ image.  This image has all the information already set up, including text files specifying which image revision the device is running on.  Once this is done, we use dd to dump an image of the entire sd card onto the PC’s hard disk (dd if=/dev/sdb of=moz-ref-v1.dump bs=100M). When this is complete, we have a 3.7GB file which contains the entire contents of the master card image. We can then write this file directly to another sd card to get an identical copy of the master (dd if=moz-ref-v1.dump bs=100M). The problem is that this doesn’t scale too well. We are aiming to be able to write to 14 cards at the same time. I have investigated using tee(dd if=moz-ref-v1.dump bs=100M | tee /dev/sdb /dev/sdc /dev/sdd > /dev/null) but found that it wouldn’t write to raw devices. Another option would be to use a for loop and start a bunch of dd processes in the background. While this would have worked, we would be using a really high amount of hard disk throughput, scaling linearly. Instead, I decided to write a limited subset implementation of dd in Python.  I used the optparse library to implement the command line interface and standard Python I/O for the cloning process.  After timing a few runs, I found that my python script was about 98% as fast as the canonical implementation.  I only measured wall time as that is the only thing of value for this situation.

A further optimization that I would like to do is the ability to store the image files in a compressed format and decompress them on the fly.  Because the dump files contain every single bit that the filesystem tracks it ends up being the same size as the SD card itself.  In our case, we have a 4GB filesystem even though only about a gig of that is used.  The most simple way to get around this is to compress the image files.  Rather than worrying about manually decompressing the file before feeding it into the duplication program, I am going to implement decompression in the duplicator program.  I found that Python has a really nice BZip2 module in the standard library.  This is module provides a full file interface for a BZip2 compressed file.  Before I decided to implement this, I wanted to check that the module is able to decompress files on the fly.  I started by generating a file with random data (dd if=/dev/random of=random bs=1024 count=1024) which I then computed a sha1 has for (openssl sha1 < random > random.sha1). At this I opened an interactive Python interpreter and ran the commands:

>>> import bz2
>>> in = bz2.BZ2File('random.bz2')
>>> out = open('random', 'w+')
>>> while True:
...  buffer = f.read(1024)
...  if buffer is '':
...   break
...  o.write(buffer)
...

Once this had completed, I exited the python interpreter and compared my sha1 hashes:

jhford$ cat random.sha1
dc34e2d6308786e5e5857f7b0b1126097060df6c
jhford$ openssl sha1 < random
dc34e2d6308786e5e5857f7b0b1126097060df6c

This tells me that I can safely use the BZ2File class for implementing compressed sd card images. My current implementation strategy is to have files that have a ‘.bz2′ extension automatically treated as either a file that is compressed (input) or should be compressed (output).

I am continually impressed by how comprehensive the Python standard library is.  It seems that every time I write something in Python, there is a built in module to do anything that isn’t specific to the problem at hand!

The code for my duplicator implementation is being developed at http://hg.johnford.info/multi-dd and the imaging scripts for the mobile work lives in the build repository at http://hg.mozilla.org/build/tools in the directory buildfarm/mobile

WinCE Try Builds

I know my last post about Try Server work was enabling Maemo and WinCE builds but we later realized that those were actually Windows Mobile. While based on Windows CE, Windows Mobile is a distinct product. Thanks to Nick’s work I was able to get Try builds up and running a lot quicker. Because this is a straight Firefox build, I was able to use the standard try server build factories. I have asked some of the mobile team developers to verify the builds manually before I mark the bug as resolved fixed. One issue that I have noticed is that the Tinderbox build logs aren’t being scraped properly. An example is that

TinderboxPrint: jford@mozilla.com
TinderboxPrint: 1252624187
Comments: No description given

is not showing up in the waterfall column. I have seen this with other builders, so I am going to assume that it isn’t specific to my code.

This now means that when you submit your code to the Try Server a WinCE build will kick off. I hope that this helps the WinCE team! If you notice anything strange going on, please don’t hesitate to contact me. I am always on irc.mozilla.org in #build as jhford.

Successful builds!

14 More Nokias

We have gotten the remainder of our order of 20 Nokia N810s.  This brings our total pool of devices up to 40!  I have gotten them fully set up and they are running in staging.  If everything looks good tomorrow, I will move them over to production.

I have also moved them all to the conference table in FAIL to see if this improves our connection reliability.  This will really help us to keep posting reliable and timely numbers.  I still have 3 units running in staging which are using a MicroSD card for their entire os and data drive.  If this works out well, we will hopefully improve our end to end times on builds and make it even easier to reimage a malfunctioning device.

Credit Card Fraud

I checked my Visa account tonight and found out that my bill was $2100 more than it is supposed to be.  Turns out that someone though it’d be fun to order a $1500 Vacation from something that looks like a front and $600 from Rogers Wireless.  I have spoken to RBC and Rogers Wireless and explained the situation.  RBC is reversing the charges and conducting an investigation.  Rogers Wireless is launching a fraud investigation.

I also did an Equifax and Trans Union credit check tonight.  The Equifax one comes up clean as does the majority of the Trans Union one.  Unfortunately, the Trans Union report shows that Rogers has done a credit check on me.  This negatively impacts my credit rating and is the last thing I need right now.  I called the non-emergency number of the Police to report the fraud and was greeted by an apparent wall.  I wasn’t speaking to an officer though.  This person informed me that even though I live in Toronto normally, and that BOTH charges on my card show as Toronto, ON it isn’t a matter for Toronto Police unless I am physically in Toronto.  Nice to see my tax dollars hard at work!  This numpty suggested that I visit the local police here to give a report.  I don’t understand this.  Why on earth would a Mountain View police officer be the one to deal with a crime committed against a Canadian, in Canada with Canadian companies.  I doubt that it would even make its way up to Toronto.

This couldn’t have come at a worse time.  I wonder if the false credit check will be removed from my credit history as I absolutely didn’t request it, the $600 charge or the $1503 charge!  As I sit here, I wonder what I can do regarding this situation.

Six Nokia N810s up in 1 hour

Today John O’Duinn found some new N810s around the office.  While they weren’t destined for us, we worked a trade (Thanks Nick!) for some of the devices we have arriving soon.

The new devices still in their boxes

The new devices still in their boxes

The imaging process that I described in an earlier post requires some time to bake in a staging environment because it requires some fairly significant changes in how the device acts.  In the interim I have generated an image for flashing the N810s that still uses the internal raw flash memory.  To do this we set up a reference device exactly as needed.  After this I generated a new filesystem image (sudo gainroot; mount -t jffs2 /dev/mtdblock4 /opt; mount /dev/mmcblk1p1 /floppy; cd /floppy; mkfs.jffs2 -r /opt -o moz-ref-image-v1.jffs2 -e 128 -l -n). Once this was done I transfered the files to my desktop pc and wrote an automation script that flashes the devices. This puts the root filesytem onto the device. For the pagesets used by talos we need to use the internal controlled flash card. For this I have written a script which takes the device files for each plugged in maemo and rsync’s the contents of the drive. Using a helper script I have made this work on 6 devices at once.

Data being transfered to the new N810s

Data being transfered to the new N810s

It took a less than one hour to get all the machines from unopened shipping box to running in our staging environment. If everything looks good, I will move these six new devices over to production tomorrow. This is a significant improvement over the multiple hour per device process of setting up a device we used before. This process scales very well. It takes 90 seconds to flash each device with a root file system and is done in serial. Setting up the page set file system took 6 minutes for 6 devices and is done in parallel.

In Staging

In Staging

I am eagerly awaiting the arrival of 14 more N810s some time this or next week.