Monthly Archives: March 2009

Python Frameworks

I came to a descision today that I was going to learn a Python ORM framework to use. My project is going to require a lot of database access and using an ORM framework simplifies this. The ORM also hides many of the complexities. If you are interested, I’d highly recomend following this tutorial.

In doing some research into my CGI + SQLite issue I have been constantly asked “…but why are you using CGI”. It turns out that CGI + Python is not really used much. Many people recomend using mod_wsgi, and I think that instead of using raw wsgi I would like to use a framework. So far, CherryPy is what is sticking out to me because it is very lightweight and includes its own development server. This makes it easier to test my code because I (hopefully) would be able to use the Eclipse/Pydev debugger. This is something I have wanted to for quite some time. The main reason I am looking into CherryPy is that it seems to not worry itself with forcing a templating system on you.

In a related note, I found a JavaScript jQuery plugin which looks really neat and could prove to be very valuable. It is called TableSorter. There is an example of a pagination system that really caught my eye here.

0.8

This release represents a large amount of work. I am having trouble remembering what I have done, so I will mention what I can remember. I started this release cycle by testing out a whole bunch of different ways to display the data in markup and different ways I could deal with in JavaScript. On the markup side I tried to write a div+css table to replace the default HTML tables. This has been done by others with success, but I couldn’t quite get it to size horizontally correctly. I have decided that for now, this is a secondary concern to me.

Another thing I worked on was doing table filtering. I found that my javascript code to do filtering on test status worked really well. The only problem is that it also worked quite slowly. It actually took all browsers I tried a non-trivial amount of time to render the changes. All of my approaches involved either changing a CSS class to alter its display:none property or to change the table rows to have a new css class show and hide. My first attack was raw DOM queries followed by performing regular expressions on the className property for each row. This was a very naive implementation which was incredibly slow on the 2800 test sample I was using. After some chat on IRC Dave suggested that I check out jQuery. I did. I was amazed. How have I not found this insanely cool library before? I was able to very quickly replace my horrid code with about 3-4 lines which worked much faster. Even with jQuery, the show/hide functionality took a long time to render.

The next suggestion I was given (by ted if i remember correctly) was to use CSSOM. This method involves modifying the actual CSS classes programatically. I found that this had the unwanted side effect of hiding my summary table’s counts. This was to be expected because they are using the same CSS class. This method was about as slow as the jQuery method so I decided to go back to jQuery. My hypothesis is that the delay I was experiencing was just the time it took Gecko to do the actual rendering of the table and either method will yield the same results. The more I work on my markup, the more I realise that I will make extensive use of the jQuery framework. One area I know that I am going to do this, specifically to cut down overhead is the popup image links I have in the log display. Currently, each image has an onclick=imgClick(this). This is at least 23byte overhead for each image. The sample log i was working on had 6 images. While this is not going to break anything, efficiency is efficiency. On a side note, i did a test with 48000 line file and the filters, regardless of type froze Firefox!

It was at this point that I decided that I needed to start thinking about how I was going to integrate this into the Mozilla infrastructure. When there is a check in, Mozilla has a buildbot cluster at work building the changed tree and running the unittest suite. Right now, there is no easy way to look at the output of the unittesting. What my project is doing is unifying the output logs across all machines to a single system which maintains history of each machine and each test run. In order to hook into the buildbot I needed to write a whole lot of code.

The first step was to figure out a simple way to send a log to my system. Since buildbot is written in python, I wrote a script which posts a request to a new page I created called insertlog.py. This page takes a whole bunch of log parameters including the URL that the log can be found. The reason these parameters are not included in the log itself is that it would require a modification to buildbot to put this information into the build logs. It is really easy to find out this information, and adding it to the post is one append() away. A snippet of code that will do a post is:

    data = urllib.urlencode(params)    file = urllib.urlopen('http://localhost/logs/insertlog.py', data)

Assuming that params is a dictionary, this will post to the URL with all the key value pairings.

Once the insertlog.py page has been posted to it will download the url, parse it to become a Log object and send it to the database. This is an area where my system needs major improvement. I need to talk about the design with someone. Currently I just have a function that knows how to take a Log object and insert it into the database. This solution works, but is very inflexible and I feel that I will be embarrassed by it. I hate writing code that I know will break, and I know 1000000% that this code will break on the first change I make to it. I am thinking that this might be enough to get by, but I’d really like to discuss it with someone else before going full steam ahead with it.

I have also doodled (image coming soon) a diagram of the information flow of Logfribulator. As it stands right now, there is an insert and view page. I am thinking that a basic administration page would be of use to deal with common situations like “purge all tests before xxx date” or “delete tests from machine xyz”. This is something to look at later. The view page is still up in the air. I am currently thinking that for DPS911 I am going to aim to have a list of the logs in the database as well as the ability to view all the records in the database when clicked in the log list.

I received a conditional r+ on my XPCShell timestamps patch from Ted. I have a couple issues that need to be addressed, but it was otherwise acceptable. Thanks for your reviewing time Ted :) My autoconf option patch got r-. I have to figure out whether this option should be kept. For this, I will need to speak to the Reftest and Mochitest folks to find out if it is ok to just have the timestamps always on. If it is decided that there should be an option, I will need to figure out whether the default should be on or off and rework the autoconf option patch. I am starting to wonder if there is any value in writing a little document specifying a standard way that Mozilla unit tests could present testing output.

This brings me to the final portion of my 0.8 release. I decided that I was getting sick of working with my code. I felt that I had learned so much in the last little while that I had made lots of mistakes. I rewrote 90% of my parser. The only things that wasn’t touched was the parser core which worked, and works well for my purposes. I also rewrote my unit tests from scratch. My older ones were a little shoddy and didn’t really have good test coverage. For the absolutely critical parts, like the functions which test if a line has passed, I have covered every permutation. Thankfully, they all work. I am still debating the value of the cgi script to upload a log file and show the parsed output. I don’t think this is as valuable as a python script that the individual developer can run against their log and get the output in a format of their choosing. I am going to do some optparse magic on my main parser, as it works as a standalone parser.

I am really enjoying working on this project. I have learned so much this semester!

Database issues

I have been having major issues with getting SQLite to work from within my CGI scripts. I am able to read from the database perfectly fine, but I cannot write to the database from within the CGI environment. When I am outside of the CGI execution environment it works fine. I have tried everything I and many others can think of. I even set up a fresh Fedora vm just to test this out and it exhibited the same problems. Here is a test case of my issue. The database in question can be created with “CREATE TABLE test (x);” entered into the command “sqlite3 /tmp/db.sqlite”

#!/usr/bin/env pythonimport sqlite3import cgiimport osprint "Content-type: text/html\n"try:  con = sqlite3.connect('/tmp/db.sqlite')  cur = con.cursor()  cur.execute("select * from test;")  for line in cur:      print line  cur.execute("""INSERT INTO test VALUES(1);""")  con.commit()  con.close()except:  cgi.print_exception()

Either way, I have decided to focus my efforts on working with Postgres. I have done this because I have spent hours and hours trying to get this to work with no progress. I just get an “OperationalError: unable to open database” message. I feel that in the long term Postgres will be the better solution. Each time unit tests are run for a build on the build bot cluster there will be approximately 40-60 thousand tests that will need to be inserted.

My next challenge was to get Postgres working on my Macbook. I have been working on my Macbook because it makes my workflow very simple. I just save my python scripts and they are live in my CGI environment. To install Postgres I had a choice of using macports or the semi-official EnterpriseDB installer. I ended up going with the EnterpriseDB version for no particular reason. The next step in converting to Postgres was to install a Postgres driver for python. I am using the system default python installation located at /Library/Python/2.5. This means that when I try to install python modules, I cannot use macports as it uses its own python installation in /opt/local. This has worked really well for me so far, but while trying to install the database driver I kept getting a file not found error. This was a very cryptic and highly annoying error. Turns out that the problem I was having was that the driver, called psycopg2, uses a C extension to interface with Postgres. This C extension requires that the ‘pg_config’ executable be in the PATH variable during build time. I solved this by changing the pg_config attribute in setup.cfg for psycopg2 to pg_config=/Library/PostgreSQL/8.3/bin/pg_config. I am assuming that I could also accomplish this by adding my Postgres installation’s bin directory to my path.

Once I had this, I checked that I had a working module by launching the interactive python interpreter and running import psycopg2. It didn’t blow up and that makes me happy!

Here is a sample of code for Postgres which works in the CGI environment:

#!/usr/bin/env pythonimport psycopg2import cgi#import osprint "Content-type: text/html\n"print "Hello"try:  con = psycopg2.connect("dbname='logfribulator' user='logfribulator' password='mozilla'")  cur = con.cursor()  try:      cur.execute("""INSERT INTO log (build_number, start_time, end_time, builder, factory, slave, machine_name, log_url)VALUES(1,timestamp '2001-09-28 01:00',timestamp '2001-09-28 01:00','builder', 'factory', 'slave', 'machine', 'url4');""")  except psycopg2.IntegrityError:      con.rollback()  else:      con.commit()  cur.execute('select * from log;')  for line in cur:      print line  con.close()except:  cgi.print_exception()

This script will only do the insert if there is no issue. If there is an issue only the select query will be done.

0.8 Status

I am working on writing up a full blog post for my 0.8 release, but the long and short of it is that I have designed the rest of my system and written the database for use with Postgres. I have had a lot of frustration with Sqlite, culminating in a weekend wasted because code which would execute outside of CGI wouldn’t execute inside of CGI.

Release 0.7 Code

Code is now up at here.

EDIT: Our planet doesn’t seem to like html being in the post’s body