Archive for category Programming

Figuring out which files are touched while installing software

We are working on getting our infrastructure up to speed for Maemo 5 and Maemo 5 QT builds. A critical part of Maemo5 building is the scratchbox. This is a toolkit that Nokia uses to make development on for their linux based phones easier. We have enough linux build slaves in production that it is impractical to deploy scratchbox by hand on each machine. Scratchbox also does internet package downloads which means that we could get different packages each time we try to install the scratchbox. We already have a fairly old version of scratchbox which is set up with the Chinook sdk that we have used for doing our Maemo builds thus far. Originally I was under the impression that we were going to need to have 2 totally seperate scratchbox installations but thanks to Doug T. for showing me how to upgrade our existing scratchbox 4 installation to scratchbox 5.

My concern with this upgrade, however, was that files outside the /builds/scratchbox directory were going to be touched. I wanted to be thorough so I did an experiment. I ran find -mount -type f -exec openssl md5 '{}' \; | tee -a /file-list ; find -mount -type f -exec openssl md5 '{}' \; | tee -a /file-list before and after the scratchbox upgrade. The -mount and two different runs at our two mountpoints was to ensure that we didn’t hash things like the /dev, /proc, /sys filesystem. My original intent was to do diff file-list1 file-list2 but that resulted in showing me every single file that changed. I only wanted to know the files that changed outside of my scratchbox root directory of /builds/scratchbox. My diff was polluted by 77,000 files that resided in the scratcbox root. I figured that the best option at the time was to hack up a quick python script:

#!/usr/bin/python
#This file is a quick script to process the output of
# find / -mount -type f -exec openssl md5 '{}' \; | tee -a
import sys, os.path, re

if not len(sys.argv) == 3:
    print "purple monkey dishwasher"
    exit(1)
filename_a = sys.argv[1]
filename_b = sys.argv[2]
if not os.path.exists(filename_a) or not os.path.exists(filename_b):
    print "insert change into meter and press green button"
    exit(1)
data={}
pattern = re.compile("^MD5\((?P.*)\)= (?P.*)$")
#Get the data from A
f = open(filename_a, 'r')
for i in f.readlines():
    m = pattern.search(i)
    data[m.group('file')] = m.group('hash')
f.close()
f = open(filename_b, 'r')
sbfile = re.compile("^/builds/scratchbox") #pattern describing files to ignore
#Figure out diff to B
f = open(filename_b, 'r')
for i in f.readlines():
    m = pattern.search(i)
    if not data.has_key(m.group('file')):
        if not sbfile.search(m.group('file')):
            print 'new file - ', m.group('file')
    else:
         if not sbfile.search(m.group('file')):
             if not data[m.group('file')] == m.group('hash'):
                 print 'updated file - ', m.group('file')

This is code I have written to scratch my own itch. I am posting this as it might be useful to someone else. if you wanted to ignore a different directory you’d change sbfile = re.compile("^/builds/scratchbox") to be a pattern describing your path to ignore. If you wanted to find all things that changed over your whole partition you would remove sbfile and all sbfile checks to have a final bit of code like

#Figure out diff to B
f = open(filename_b, 'r')
for i in f.readlines():
    m = pattern.search(i)
    if not data.has_key(m.group('file')):
        print 'new file - ', m.group('file')
    else:
         if not data[m.group('file')] == m.group('hash'):
             print 'updated file - ', m.group('file')

In the end, I found that the scratchbox upgrade that I did only changed my bash_history and added some tarballs to /tmp. I am very glad that this is the case as it really simplifies our deployment of the new scratchbox!

Python Frameworks

I came to a descision today that I was going to learn a Python ORM framework to use. My project is going to require a lot of database access and using an ORM framework simplifies this. The ORM also hides many of the complexities. If you are interested, I’d highly recomend following this tutorial.

In doing some research into my CGI + SQLite issue I have been constantly asked “…but why are you using CGI”. It turns out that CGI + Python is not really used much. Many people recomend using mod_wsgi, and I think that instead of using raw wsgi I would like to use a framework. So far, CherryPy is what is sticking out to me because it is very lightweight and includes its own development server. This makes it easier to test my code because I (hopefully) would be able to use the Eclipse/Pydev debugger. This is something I have wanted to for quite some time. The main reason I am looking into CherryPy is that it seems to not worry itself with forcing a templating system on you.

In a related note, I found a JavaScript jQuery plugin which looks really neat and could prove to be very valuable. It is called TableSorter. There is an example of a pagination system that really caught my eye here.

OSX Development

I am noticing that a lot of people are having trouble getting used to developing on OSX. I hope that this is useful to those people, I am targeting someone who has at least some experience with Linux or another unix-like OS. I haven’t been using OSX long myself, but I have adapted quickly :)

What is XCode?
XCode is more than an application, it is a complete development environment. When someone ‘installs xcode’ they are installing compilers, frameworks, headers and build tools. If you can compile a C program you have XCode. Just to be sure, you can check by using commands like this

vortex:~ jhford$ which gcc/usr/bin/gccvortex:~ jhford$ which g++/usr/bin/g++vortex:~ jhford$ which make/usr/bin/makevortex:~ jhford$ which xcodebuild/usr/bin/xcodebuild

What is MacPorts
MacPorts is a system utility to ease installing non-apple unix software. This program is used to install things like subversion, mercurial, libidl, autoconf213 and ccache. Basically, what this program does is fetch sources, patches and build instructions and compiles the program for your system. This is very similar to BSD ports. To use the utility, you use the port command. Some sample uses:

sudo port install ccache mercurial libidl autoconf213port search subversionport list installed

The man page is the best place to go for help with MacPorts (man port). Unlike Fink, which uses Debian/Ubuntu’s apt-get internally, all things you install through MacPorts are compiled on your machine during install. Word of warning: don’t need anything in a hurry! Our network is notoriously slow with MacPorts.

What is a .dmg?
A .dmg is a compressed disc image format. It contains one or more filessytems, usually HFS+ (read: mac filesystem). This is how most .app and .pkg files are distributed because it maintains the mac specific file meta data.

What is a .app? .pkg?
You might have noticed that some applications are just icons, like Firefox or OpenOffice. To install them you drag the icon into your /Applications folder and boom, intalled. The most important thing to remember here is .app is just a folder! I will prove it

This means that if you want to start an application that is in a .app from the command line to see console output, you can do something like $ /Applications/Firefox.app/Contents/MacOS/firefox or /Applications/OpenOffice.org.app/Contents/MacOS/soffice. Usually your application will live under Appname.app/Contents/MacOS/

A .pkg file is an installer file, kind of like a .run script on linux or a .msi file in windows. It is too just a folder. Lots of things that are folders have file extensions on OSX. If in doubt, run file on it in Terminal

Universal Binary?
In case you didn’t know, Macs used to have PowerPC cpus. To ease the transition from PowerPC to Intel, Apple created something called a ‘Universal Binary’. These are basically executable files which can be run on either processor. If you want to check if you have made one you can use the unix command ‘file’ on the application. An example of a Firefox release binary shows that this is a universal binary:

vortex:Firefox.app jhford$ file Contents/MacOS/firefox-bin Contents/MacOS/firefox-bin: Mach-O universal binary with 2 architecturesContents/MacOS/firefox-bin (for architecture i386): Mach-O executable i386Contents/MacOS/firefox-bin (for architecture ppc): Mach-O executable ppc

Sometimes the second architecture will mention ppc7440 or something similar, this means it is a G4+ binary (i.e. requires altivec, similar to MMX/SSE).

Command, Option, Control?
I guess this is only applicable if you are in front of the machine. Command, Option and Control are similar to but not identical to Start, Alt and Control. In OSX, things are a little more logical. If the key combo you want to use involves doing something, say, close a tab, it is Command + W. In Windows/Linux it is Control + W. Keep this in mind, nearly all ctrl or alt key combos for GUI programs on Windows or Linux are the same on Mac but with Command. Option, like the name implies, give you an option. Say you have save and save as. Command + S is like save and since save all is so similar, it may have Command + Option + S. Worst case, you can look around in the menu bar at the top of the screen. A clover-ish thing means Command, a downward sloping line is Option and an up arrow is Shift.

On the command line, Control works just like ctrl in Windows or Linux. Control + D will give you EOF, Control + C will quit the application.

Screenshots!
To take a screenshot simple press one of:
Command + Shift + 3 – Fullscreen
Command + Shift + 4 – An area of the screen, like Snipping Tool in Vista
Command + Shift + 4 then release and press space – A ‘view’, could be a window or a sheet. Like Alt+Printscreen
These commmands will place the pictures on your desktop in the format PictureX.png where X is automatically incrementing.

Hopefully this helps you get the hang of Mac. If you have any questions feel free to leave a comment or ping me in IRC at irc://irc.mozilla.org/#seneca

Pydoc Server ?!?!

For all of you working on anything python:pydoc -p 3500. This will create an http server at http://localhost:3500/ to browse all the python module documentation currently loaded.

DistCC on OSX for Mozilla Builds

I have gotten Mozilla to build using DistCC on Mac. Luckily, everything you need is already installed with XCode. If you can build Mozilla in the machine, you should be good to build using DistCC. DistCC is a wrapper for the compiler which distributes portions of the build process to other machines. Some things which can’t be done in a distributed way like linking are done on the machine in the driver’s seat. In DistCC you have one client and many servers. I found this confusing at first, because I was thinking of the computer driving the build as serving jobs for clients, but in fact, it is the client which is using ‘compile servers’.

This is actually a very simple thing to set up, once you know what you are doing. For this blog post, I tested using two of our school macs. I have spain (142.204.133.122) as a build server and canada (142.204.133.7) as a client. On all the server machines I have started the DistCC daemon by running distccd --daemon. It is possible to run this with only allowing jobs from specified IPs but this is easier. Make sure that port 3632 is reachable by your servers as that is where the jobs are sent and results received. On the client machine (in this case canada) I need to configure distcc to make use of the build servers. With the included version of DistCC on Leopard, you have to specify this using an exported environmental variable. I used export DISTCC_HOSTS='localhost 142.204.133.122' in the shell I ran the build from. Next you are going to need to wrap the compilers to make use of DistCC. I have done this by overriding the default C and C++ compilers in my .mozconfig file (below) with the CC and CXX environment variable in my make flags. You will want to specify a job count (-j6 in this instance) to make use of DistCC. It is hard to pick a good number, but it should definitely be greater than the number of total processor cores and some say an extra two for good measure.

canada:~ jhford$ cat ~/.mozconfig
mk_add_options MOZ_OBJDIR=@TOPSRCDIR@/obj
ac_add_options --enable-application=browser
mk_add_options MOZ_MAKE_FLAGS="CC='distcc /usr/bin/gcc' CXX='distcc /usr/bin/g++' -j6"

When you start your build with make -f client.mk build you will be able to monitor the status of your distribution using the also included distccmon-text. An important thing to note, which caused me much grief, is that this program is run on the DistCC client (the driver). This program will give you a nice little bit of output like this:

This shows you which jobs are going to which machines. Another thing that indicates that distcc is working is the g++ build steps, which should now look like this (greatly abridged) example:

distcc /usr/bin/g++ -o nsMorkHistoryImporter.o -c -I../../../../dist/include/system_wrappers -include /Users/jhford/mozilla-central/config/gcc_hidden.h -DXPCOM_TRANSLATE_NSGM_ENTRY_POINT=1 -DMOZILLA_INTERNAL_API -D_IMPL_NS_COM -DEXPORT_XPT_API -DEXPORT_XPTC_API -D_IMPL_NS_COM_OBSOLETE -D_IMPL_NS_GFX -D_IMPL_NS_WIDGET -DIMPL_XREAPI -DIMPL_NS_NET -DIMPL_THEBES -DZLIB_INTERNAL -DOSTYPE=\"Darwin9.6.0\" -DOSARCH=Darwin

As far as results go: with an iMac that has 1GB of memory as client I got these times for the DistCC with a Mac Mini as a server (-j6):
real 20m41.704s
user 22m5.748s
sys 5m22.026s

With 12 jobs (-j12) I got:
real 19m54.374s
user 20m39.539s
sys 5m20.019s

And with just the single iMac with 1GB (-j4):
real 20m25.626s
user 27m3.408s
sys 4m22.820s

I watched the output of the distccmon-text and noticed that only about half of the files get distributed, with the other half being done on localhost. I am thinking that it would be good to test this with a more powerful machine as the client and more servers, but as this is right now, there is nearly zero benefit to this configuration.