Archive for category Mozilla

Mobile on the new try server!

Mobile support for the new try server has just landed!  Any push after now will have an Android, Maemo 4, Maemo5 GTK and Maemo5 QT build done.  I have also added support for two magic files ‘mobile-repo’ and ‘mobile-rev’. Both of these files should be in the root directory of your mozilla-central clone.  The ‘mobile-repo’ file should contain a string with no whitespace or newlines that is a path relative to http://hg.mozilla.org/ referencing a mobile-browser repository. The ‘mobile-rev’ file should contain anything that can be used with the –rev flag to mercurial. In-repo branches or changeset identifiers work great here. Just like on desktop try, you can also have a special mozconfig for your mobile platforms. In the top level directory, create a mozconfig-extra file which contains options that are added to *all* platforms and a mozconfig-extra-android-r7, mozconfig-extra-maemo4, mozconfig-extra-maemo5-gtk and/or mozconfig-extra-maemo5-qt. I could add a check for a mozconfig-mobile that would add things to all mobile builds if that is desired.

An example of a push to try session to build against my user repository copy of mobile-browser might look like:

hg clone http://hg.mozilla.org/mozilla-central
cd mozilla-central
rm README.txt
echo 'users/jford_mozilla.com/mobile-browser' > mobile-repo
echo 'default' > mobile-rev
hg add mobile-repo mobile-rev
hg commit -m 'Testing to see if we build without a readme file'
hg push -f ssh://hg.mozilla.org/try

With this marks the end of the old try server. It also means that we have fully transitioned our try server build infrastructure to Buildbot 0.8.0.

Figuring out which files are touched while installing software

We are working on getting our infrastructure up to speed for Maemo 5 and Maemo 5 QT builds. A critical part of Maemo5 building is the scratchbox. This is a toolkit that Nokia uses to make development on for their linux based phones easier. We have enough linux build slaves in production that it is impractical to deploy scratchbox by hand on each machine. Scratchbox also does internet package downloads which means that we could get different packages each time we try to install the scratchbox. We already have a fairly old version of scratchbox which is set up with the Chinook sdk that we have used for doing our Maemo builds thus far. Originally I was under the impression that we were going to need to have 2 totally seperate scratchbox installations but thanks to Doug T. for showing me how to upgrade our existing scratchbox 4 installation to scratchbox 5.

My concern with this upgrade, however, was that files outside the /builds/scratchbox directory were going to be touched. I wanted to be thorough so I did an experiment. I ran find -mount -type f -exec openssl md5 '{}' \; | tee -a /file-list ; find -mount -type f -exec openssl md5 '{}' \; | tee -a /file-list before and after the scratchbox upgrade. The -mount and two different runs at our two mountpoints was to ensure that we didn’t hash things like the /dev, /proc, /sys filesystem. My original intent was to do diff file-list1 file-list2 but that resulted in showing me every single file that changed. I only wanted to know the files that changed outside of my scratchbox root directory of /builds/scratchbox. My diff was polluted by 77,000 files that resided in the scratcbox root. I figured that the best option at the time was to hack up a quick python script:

#!/usr/bin/python
#This file is a quick script to process the output of
# find / -mount -type f -exec openssl md5 '{}' \; | tee -a
import sys, os.path, re

if not len(sys.argv) == 3:
    print "purple monkey dishwasher"
    exit(1)
filename_a = sys.argv[1]
filename_b = sys.argv[2]
if not os.path.exists(filename_a) or not os.path.exists(filename_b):
    print "insert change into meter and press green button"
    exit(1)
data={}
pattern = re.compile("^MD5\((?P.*)\)= (?P.*)$")
#Get the data from A
f = open(filename_a, 'r')
for i in f.readlines():
    m = pattern.search(i)
    data[m.group('file')] = m.group('hash')
f.close()
f = open(filename_b, 'r')
sbfile = re.compile("^/builds/scratchbox") #pattern describing files to ignore
#Figure out diff to B
f = open(filename_b, 'r')
for i in f.readlines():
    m = pattern.search(i)
    if not data.has_key(m.group('file')):
        if not sbfile.search(m.group('file')):
            print 'new file - ', m.group('file')
    else:
         if not sbfile.search(m.group('file')):
             if not data[m.group('file')] == m.group('hash'):
                 print 'updated file - ', m.group('file')

This is code I have written to scratch my own itch. I am posting this as it might be useful to someone else. if you wanted to ignore a different directory you’d change sbfile = re.compile("^/builds/scratchbox") to be a pattern describing your path to ignore. If you wanted to find all things that changed over your whole partition you would remove sbfile and all sbfile checks to have a final bit of code like

#Figure out diff to B
f = open(filename_b, 'r')
for i in f.readlines():
    m = pattern.search(i)
    if not data.has_key(m.group('file')):
        print 'new file - ', m.group('file')
    else:
         if not data[m.group('file')] == m.group('hash'):
             print 'updated file - ', m.group('file')

In the end, I found that the scratchbox upgrade that I did only changed my bash_history and added some tarballs to /tmp. I am very glad that this is the case as it really simplifies our deployment of the new scratchbox!

What is wrong with AT&T’s Flash Ads?

Over the last couple days I noticed my Firefox getting painfully slow. The weird part was that the rest of my system was responsive. When I opened Activity Monitor it showed 100% CPU usage for Firefox. I decided to do some investigating. I used the ‘Sample Process’ feature in Activity Monitor.  After setting the display to ‘Percent of Parent’ I noticed that there was a lot of ‘Flash_EnforceLocalSecurity’ messages which lead me to believe that Flash was the culprit.
Screen shot 2009-11-24 at 3.01.37 PM
I went through my tabs, and sure enough I had lots of Flash open. This pattern kept repeating itself. I’d notice Firefox getting sluggish, close flash web pages and see Firefox performing properly and CPU usage levels back to normal. I found it strange that I could play Hulu and Youtube videos fine. I even went to www.bannerserver.com and found that while Firefox was never using 100% of my CPU. This was baffling me until I figured out what the problem was. This issue only happens when AT&T Uverse flash ads show up.

Screen shot 2009-11-24 at 2.31.55 PM
Not everyone cares to find the root cause of a problem like this. It is also only sporadically reproducible, going to the same website might show different ads each time. I would bet that a lot of people would look at this and say “Firefox is slow”, especially because the ads are there on many different pages.  These ads are also not the primary reason someone goes to the page (I’d hope) which means that it is difficult to associate the flash ad with the purpose of their tab if they do try to figure out what the problem is. Having plug-ins in a separate process (Electrolysis) seems like a great idea. I hope that, like Safari on Mac, it shows up as a totally separate process which helps avoid people blaming Firefox for poor performance.

Screen shot 2009-11-24 at 2.48.29 PM
The most annoying part of this whole situation is that I’d love to be a Uverse subscriber.  It is bad enough that they aren’t offering it in my area, but to make my browser slower is a slap in the face!

Unittests on PPC and Non-SSE2 Machines

During the lead up to the Firefox 3.5 release, there was a request to have older machines running unit tests. Our solution for the 3.5 release was to burn Ted cycles. This is unfortunate because Ted could have otherwise done much more useful things. For the Firefox 3.6 release, we automated this testing. The reason for running these tests is that our JavaScript interpreter spits out native machine code . The problem is that there are still machines capable of running Firefox but do not have all of the latest instruction sets(MMX, 3DNow, SSE2, SSE3, ad infinitum). Specifically, there was a concern over the inclusion of the SSE2 instructions when running on non-SSE2 capable hardware. This is especially important for those people who are still running an older machine exclusively for browsing. It is very important that we don’t unknowingly introduce a requirement for SSE2. We also need to test our claimed compatibility with older Macintosh machines based on the PowerPC core. To fix this situation, I have created the geriatric master. This master is in charge of our fleet of aging fleet of Pentium 3s and PowerPC G4s. It reports to the GeriatricMasters tinderbox. I currently am monitoring the Mozilla 1.9.2 and Mozilla Central nightly builds.

These are the first two succesful runs

These are the first two succesful runs

We are running some old machines saved from the Landings to Castro move. The P3s are around a decade old and the G4 Mac Mini and Dual G4 PowerMacs are about 4-5 years old. We also aren’t running matched hardware which is going to make detective work on failures difficult. I have found some machines to replace our mixture of P3s that do not even have SSE. They are based on the AMD Geode LX800 processor. These machines are what the OLPC XO have as a cpu. Because they are based on an ancient CPU core (Cyrix 5×86, technically a 486), they even lack SSE. That makes them the perfect for our testing. The model that I am looking at is the MSI Fuzzy. I don’t really understand their naming, but this machine would fit our needs perfectly. It has a fast (for the category) processor and slots for 1gb of ram. Most importantly, it allows us to have as many identical machines as we need. The Mac Mini is easier to standardize on because it was fairly popular at the time and there are only one or two minor variations of the PowerPC Mini.

Before we go rushing out and buying a bunch of new machines for this testing, we need to know how long there will be demand for this kind of testing. I would estimate that it would cost about $500 per Geode machine, and that we might be able to get nightly coverage on two to three branches on linux and windows with four machines.

Also of note is that because we are running on such old hardware, we are getting JavaScript timeouts. I don’t really know how to manually set the preference that ignores these timeout warnings, but it is causing a lot of the jobs to be killed off because they aren’t responsive. I’d love to know if you know how I can disable this in an easy to automate way. I was thinking that I could launch and kill Firefox to create a profile, modify the profile’s preference file then launch it again for the real tests. I don’t know which preference file I would need to do this on though.
slow-script

This is where I’d like to ask people who are working on the JavaScript JIT to let me know how long we need to do this testing and whether this current coverage (mozilla192 and mozilla-central) is enough. A comment on this post, bug 463262 or ping me on #build (jhford).

As a side note, this configuration makes it possible for us to run unit tests on second tier support machines and operating systems. If there is any community interest, I can look making it possible for anyone to connect their machine to this buildbot master if they have an obscure machine with spare cycles. This would require you to be able to dedicate the box for this testing, us to already produce builds that run on that architecture.

If anyone wants, I can get some photos of the machines on Monday.

Reimplementing a subset of DD in Python

Through my work on imaging our Nokia test farm, I have developed 3 approaches to imaging the n810.  The first is to set up an N810 then generate our own firmware image as a JFFS2 filesystem.  This approach gives us an N810 that is essentially factory-stock.  We found that we were still having devices fall over on a regular basis with this approach.  The next approach I tried was to put the full Maemo operating system on an SD card and boot from it.  This resulted in significantly improved reliability at a minor cost in test suite run time.  The actual imaging process is far less human involved than the older approach. All that is required is that somebody is there to change blank sd cards and execute the command again.  This process used rsync (sudo rsync -a moz-ref-v2/. /mount/point/.) to copy the files from a directory on the imaging machine onto the SD card.  As a part of the imaging process, the hostname is set as are a couple other bits of information.  It took me the better part of 3 days to image all 40 sd cards using this approach.

The third approach that we are going to move forward with uses the imaging process of the second approach to create a ‘master’ image.  This image has all the information already set up, including text files specifying which image revision the device is running on.  Once this is done, we use dd to dump an image of the entire sd card onto the PC’s hard disk (dd if=/dev/sdb of=moz-ref-v1.dump bs=100M). When this is complete, we have a 3.7GB file which contains the entire contents of the master card image. We can then write this file directly to another sd card to get an identical copy of the master (dd if=moz-ref-v1.dump bs=100M). The problem is that this doesn’t scale too well. We are aiming to be able to write to 14 cards at the same time. I have investigated using tee(dd if=moz-ref-v1.dump bs=100M | tee /dev/sdb /dev/sdc /dev/sdd > /dev/null) but found that it wouldn’t write to raw devices. Another option would be to use a for loop and start a bunch of dd processes in the background. While this would have worked, we would be using a really high amount of hard disk throughput, scaling linearly. Instead, I decided to write a limited subset implementation of dd in Python.  I used the optparse library to implement the command line interface and standard Python I/O for the cloning process.  After timing a few runs, I found that my python script was about 98% as fast as the canonical implementation.  I only measured wall time as that is the only thing of value for this situation.

A further optimization that I would like to do is the ability to store the image files in a compressed format and decompress them on the fly.  Because the dump files contain every single bit that the filesystem tracks it ends up being the same size as the SD card itself.  In our case, we have a 4GB filesystem even though only about a gig of that is used.  The most simple way to get around this is to compress the image files.  Rather than worrying about manually decompressing the file before feeding it into the duplication program, I am going to implement decompression in the duplicator program.  I found that Python has a really nice BZip2 module in the standard library.  This is module provides a full file interface for a BZip2 compressed file.  Before I decided to implement this, I wanted to check that the module is able to decompress files on the fly.  I started by generating a file with random data (dd if=/dev/random of=random bs=1024 count=1024) which I then computed a sha1 has for (openssl sha1 < random > random.sha1). At this I opened an interactive Python interpreter and ran the commands:

>>> import bz2
>>> in = bz2.BZ2File('random.bz2')
>>> out = open('random', 'w+')
>>> while True:
...  buffer = f.read(1024)
...  if buffer is '':
...   break
...  o.write(buffer)
...

Once this had completed, I exited the python interpreter and compared my sha1 hashes:

jhford$ cat random.sha1
dc34e2d6308786e5e5857f7b0b1126097060df6c
jhford$ openssl sha1 < random
dc34e2d6308786e5e5857f7b0b1126097060df6c

This tells me that I can safely use the BZ2File class for implementing compressed sd card images. My current implementation strategy is to have files that have a ‘.bz2′ extension automatically treated as either a file that is compressed (input) or should be compressed (output).

I am continually impressed by how comprehensive the Python standard library is.  It seems that every time I write something in Python, there is a built in module to do anything that isn’t specific to the problem at hand!

The code for my duplicator implementation is being developed at http://hg.johnford.info/multi-dd and the imaging scripts for the mobile work lives in the build repository at http://hg.mozilla.org/build/tools in the directory buildfarm/mobile