...making Linux just a little more fun!

<-- prev | next -->

PyCon DC 2005

By Mike Orr (Sluggo)

Pythoneers from around the world again descended on George Washington University for the fourth annual PyCon, which was held March 23-25, 2005 in Washington, DC. It's hard to decide what the highlight was: Guido's new beard, the success of the Open Space sessions, the number of attendees (just shy of 450), the international scope (I saw several delegates from Germany, and a few from Japan and Italy), the surprise sleeper hit (WSGI and integrating the web application frameworks was the most discussed topic), the Python CPAN (integrated with PyPI), the keynote from Python's most prominent user (Google), David Goodger's name ("pronounced like Badger but GOOD!"), or Guido's plans for static typing. ("Don't worry," he says about the latter, "it's just a bad dream.")

Keynote #1: Python on .NET

Jim Hugunin, who last year presented his paper on IronPython (a version of Python for Microsoft's .NET runtime environment), is now working for Microsoft. ("So I know if my computer crashes during this talk, I'll never hear the end of it.") Hugunin originally started the IronPython project to prove .NET was unsuitable for dynamically-typed languages, but discovered the opposite. IronPython on Windows .NET 2.0 Beta 1 is 80% faster than CPython (i.e., "normal" Python). Why? Different bytecode, support library is C#, and MS has put a huge number of resources into optimizing .NET and its machine-code compiler. (IronPython on Mono "probably runs about as fast as CPython", he said, although "this could improve with optimization". Mono is a Linux-compatible version of .NET.) IronPython thus joins Jython (Python on Java), Parrot (Python on Perl 6), and PyPy (Python on Python) as competitors to CPython, meaning Python is now more a language specification than a particular C implementation.

Keynote #2: Python at Google

Greg Stein from Google talked about why Python is one of their primary development languages (alongside C++ and Java). They found Python highly adaptable, fast to learn, and easy to maintain. Many of the Python modules they use are actually SWIG wrappers around C libraries. "We use lots of swigs."

Although only a few of Google's user-visible services are currently running on Python (groups.google.com, code.google.com), Python is used extensively in their infrastructure. Google is a challenging environment to administer because it has several servers. "OK, a lot of servers." How many companies do you know with a thousand servers to feed? Their development environment is written in Python: libraries that describe how to build software, utilities to automatically run unittests and enforce a peer review before code is checked in, and packaging systems. Python lets their tools evolve easily as hardware/software is upgraded.

Successfully checked-in code goes to a staging server, then to the "data centers" which push it to the production servers. All this is done in Python. Other Python scripts monitor the production servers: Are they running? Do they think they're healthy? Are their hard drives and CPU temperatures OK?

Google has released some Python code to the public, such as Goopy (a "functional programming" library). They plan to release more, but slowly and carefully. Guido asked, "When are we going to see an open-source build system?" Greg said that it'll be as soon as they can convince the management.

One interesting detail is that since Google always has a ton of user queries coming in, they can test new servers/applications by simply diverting 1% of the traffic to it and seeing if they fall over. (Steve Holden, PyCon's coordinator, called that amount of traffic "frightening".) The command-line tool to do this is, of course, written in Python.

Keynote #3: Guido van Rossum

Somebody named Guido got up and talked about Python as if he owned it. Rather than throwing cans of spam at him, the audience listened intently. Why? Because this was Python's creator, giving his annual address about the state of the language. After discussing "why my keynotes suck" (because he'd rather be talking about the intricacies of language design), "why the beard", and "who is my new employer" (Elemental Security, a company developing an enterprise security product they won't talk about, who also won't let Guido develop Python 3000 on company time), Guido plunged into the controversies du jour.

How did the @decorator syntax win? "Everyone disliked it equally, it's unambiguous, it doesn't obscure the function definition, and it's similar to Java."

If function decorators are so necessary, why not class decorators too? Metaclasses do the job well enough. Other PyCon talks showed how metaclasses are functions that tweak a class object after it's created; for example, to make it keep a list of all its instances. To specify a class's metaclass, give it a .__metaclass__ attribute or define a __metaclass__ variable in the module. Or use a little-known feature of the type builtin to create a class on the fly:

type('MyClass', 
     (BaseClass1, BaseClass2),
     {'my_attribute': 1})
You can also subclass type to make a class factory, as shown in David Mertz's tutorial on metaclasses.

Back to Guido's talk. Python is getting more popular. The Barton Group did a survey of what their developer readers are using, and Python was at 14%. The Barton Group described it as the "P" languages (Python, Perl, PHP) vs. the "C" languages, and noted that Python has fewer security vulnerabilities than Perl or PHP. Downloads and page views at python.org are both up 30% from last year.

2005 also featured Python's first security alert, against a vulnerability in SimpleXMLRPCServer.py. It's fixed in Python 2.4.1 and 2.3.5; patches for earlier versions are available. The experience showed that Python needed a Security Response Team, which is now in place. Previously there was no place to send a security alert without posting it on a public forum or e-mailing it to Guido. Now anybody can e-mail alerts to [email protected], and they will go to the entire response team.

Python has gotten burned for putting out too many new features in minor releases, so now only bugfixes will go into minor releases (e.g., 2.4.5), and features will have to wait until a major release (e.g., 2.5). The community has indicated it wants a "slow growth" policy on features, with more focus on stability and optimization.

Guido's employer won't let him work on Python on company time, so Python 3000 (a.k.a. Python 3.0) will not appear anytime soon. But it now has a PEP describing the direction it will go. Python 3.0 will have backward incompatibilities as Guido adds a few keywords, eliminates builtins he wishes he hadn't created, and reorganizes the standard library into a deeper hierarchy. Much of the CPython code is still useful though, so it won't be a total rewrite. Some features will be backported to Python 2.x, sometimes accessible as "from __future__ import <feature>". Old-style classes will be eliminated in 3.0, as will map/filter/reduce. lambda may be replaced by anonymous code blocks, although a syntax has not emerged. ("Statements in curly braces was just a joke, really!")

Python 2.5 will have any(iterable) and all(iterable) builtins; they both return booleans. any tells whether any of the values are true, all whether all the values are true.

The Bad Dream

Then Guido said, "If you don't like the next part, just pretend it's all a bad dream." Guido wants to add optional static typing to Python 3.0. Here's a possible syntax:

def foo(a: int, b: list[int]) -> list[str]
This implies:
    a = __typecheck__(a, int)    # Raises error if adapt(a, int) is false.
If that horrifies you so much you want to switch to Ruby because "Guido is trying to turn Python into C," don't worry. He reassured us, "Nothing is settled yet!!!" There are a number of unresolved issues:

The PyWebOff and WSGI

In the beginning there was Zope. Zope was a web application framework and the basis for several Content Management Systems, but it had some discontents who dared to call it "monolithic" and "unpythonic". And behold, then there came Webware, and it was Modular and didn't impose New Programming Languages on site developers, and there was much rejoicing. But others rebelled at even Webware's Heavy-Handedness and arbitrary Conventions and wanted something even Simpler, and a Ton of frameworks appeared: Quixote (which calls itself "lightweight Zope"), SkunkWEB ("Smell the power!"), CherryPy ("fun to work with"), and some thirteen others. Meanwhile, Twisted had released its own Whole Earth Catalog of asynchronous Internet libraries including Nevow. Trying to find the forest through the trees, Ian Bicking held a Shootout at PyCon 2004, comparing several frameworks against each other.

This year, Michelle Levesque went a step further and said we've forgotten about "Brian". "Brian" is the typical non-techie developer who just wants to get a simple dynamic site up. The Python frameworks have now mushroomed to forty [slide showing a montage of logos]. Experienced Pythoneers know that Zope is easy if it does what you want out of the box, Quixote is good for sites that are big on calculations and small on eye candy, Twisted is good for high-demand sites, etc.; but Brian doesn't know this. Brian sees forty apparently equal frameworks and chooses this: [slide with the word "PHP"], or maybe this: [slide with the word "Java"]. Python is about having One Obvious Way To Do It, but in the web framework world it's Ruby and Java that have a unified model, not Python. Quoting Moshe Zadka, "You're not a Real Python Programmer until you've written your own web templating toolkit." But, Michelle said, there are a lot of Brians in the world; they form by far the biggest potential "market".

Michelle's plea to developers is, "Stop writing kewl new frameworks! Help improve the top few frameworks so they can become a best of breed. And put documentation on python.org telling Brian, "For heavyweight use A, for lightweight use B, for performance use C, for XML use D, for no XML use E." Of course, this means the Python community must come to consensus on which are the top frameworks. Some might think "when hell freezes over", but Michelle has a plan.

She issued herself a challenge to implement a typical Brian application (in this case, a book checkout system) in each of seven frameworks, and compare the experiences (i.e., compare the grief). She also blogged her thoughts along the way, making this perhaps the first PyCon talk with its own blog. Of course this is just one person's opinion, but it serves as a starting point for discussion.

Ian Bicking followed Michelle's talk with a remarkably similar topic: "WSGI Middleware and WSGIKit (for Webware)". He agrees with Michelle that the proliferation of incompatible web frameworks is the most important issue preventing Python from enjoying the huge growth curve of PHP, but he takes a different approach. Rather than just writing documentation, Ian would like to see these frameworks become interoperable. WSGI (the Web Server Gateway Interface) is a proposed standard for Python (PEP 333). It's a protocol for web servers to communicate with application frameworks. Currently, each framework has to come with a whole slew of adapters (CGI, FastCGI, mod_python, a custom module, a standalone HTTP server) to communicate with Apache. WSGI allows each framework to need only a single virtual adapter, and the webservers can provide "best of breed" adapters that plug into any WSGI-compliant framework. You can also plug in single-purpose "middleware" objects that look like an application to the webserver, and like a server to the framework, or even chain middleware objects together. This could allow alternate URL-parsing and Session modules to be plugged in and out, for instance, eliminating the need for each framework to reinvent the wheel, and allowing applications to mix and match which coding styles they prefer (e.g., WebwareRequestObjectMiddleware vs QuixoteRequestObjectMiddleware).

Ian refactored Webware to make it WSGI compliant. Webware in this environment turned out to be a pretty thin layer over the standard protocol. Why use Webware at all then? One, to support existing Webware applications. Two, because some developers prefer the Webware servlet style. WSGI isn't meant to be used directly by application developers; its dict-function-iterable model is inconvenient for that.

Since there are two models for concurrency, applications would have to check the 'wsgi.multithread' and 'wsgi.multiprocess' keys (boolean) and take appropriate action depending on which style the web server is using.

These two talks sparked a lively debate in Open Space sessions and at lunch tables about whether such integration between the frameworks is (A) necessary and (B) desirable. Dissidents argued that "everyone's going to have their favorite no matter what you do", "common design patterns are more important than common implementations", and "it's not that important". Several people started collaborating to make their favorite frameworks WSGI compliant (most notably Quixote and Aquarium). However, the discussion also showed that people have widely differing opinions about what WSGI goals are worth pursuing and how the proposed "middlewares" should behave. This will be followed up on after PyCon. It's too bad that nobody thought to organize a sprint for this. (Sprints are group hacking sessions that occur before or after PyCon.)

Donovan Preston ("the Nevow guy") followed Ian's talk and showed how Nevow can encapsulate the Javascript needed to send little messages between the client and server; e.g., to update widgets on a form without redrawing the entire page.

Other Talks

Michael Weigend spoke on "eXtreme Programming in the Classroom". Weigend has been using XP and Python to teach programming to school children. (XP is the abbreviation for eXtreme Programming; it's not related to Microsoft's operating system of the same name.) In XP, the developers have to gather "stories" -- use cases and usage examples -- from the user. Then they have to choose a metaphor for their application, in this case a text editor and chat room for nine-year-old students in Germany learning English. So it might have pop-up lists for common responses, for instance. The developers then explore implementation tools (e.g., GUI libraries) and make time estimates. Then they choose a "story" to work on, a piece small enough to do just one thing, and split into pairs to each write an implementation of the story. Then they gather and select the best implementation. After all the stories are thus implemented, they integrate the best implementations together. That's one iteration, which may take a week. Then they evaluate the integrated product: does it work right? does it really fulfill the stories? If not, iterate again to come up with a better implementation. The beauty of this method, Weigend says, is "the project is always a success". Even if you have to stop work on it early, at least it does something useful, even if it doesn't fully comply with all the stories. In contrast, with linear software engineering, if you stop the project early you may have nothing running at all.

Holger Krekel introduced py.test, a tool I've been avidly using recently. It's like unittest but simpler and more flexible. You merely write functions with assert statements, and pass your module to the command-line tool. There are a few support functions to handle cases like "this test should raise this exception" and "I want some common code executed before each test". Test cases can also be iterative:

def func(x, y):
    assert ...

def test_more():
    for (x, y) in [(1, 2), (1000, 2), (0, 0)]:
        yield func(x, y)
test_more is a test function because it begins with "test_". But it's also a generator that calls another function with a different set of arguments each time. This is useful for testing boundary cases in your other function (func). There's an option to automatically drop to pdb (the Python debugger) on any failure. There's also a sessions feature that runs the remaining failed tests as you edit and save each offending module.

One session summarized the sprint activity this year. Chandler fans experimented with a plug-in API and did three projects. Mailman fans worked on Mailman 3, a SQL database back end, and started using SQLObject. ZODB fans added BLOB support and an iteration API. Zope 3 developers worked on a weblog object using Dublin Core metadata. A Python Core team worked on an AST step for the Python compiler. And distutils fans did phenomenal work, finally implementing the long-desired Python CPAN. They took the Python Package Index and added file upload, so that it could store the packages themselves as well as pointers to them. By the way, the coordinator said PyPI is pronounced "pippy", not "pie-pie". "Pie-pie" sounds identical to PyPy. But old habits die hard. I'm used to saying "pie-pie", just like I say "line-ux" instead of "linnux" most of the time. (I still remember when Linus spoke at LinuxExpo in 1998 and called himself Line-us and the OS Linnix in the same sentence!)

Richard Jones gave a talk about an excellent product, Roundup, an issue tracker with web, e-mail, and command-line interfaces. I'd used TkGnats a few years ago and was happy to learn that Roundup has acquired Gnat's most important features but with a slicker interface. Sending it an e-mail creates a new issue or attaches the message to the existing issue. The main page shows you immediately which issues are open, and you can set categories, priorities, and keywords, and save custom searches. It can use several database backends and comes with a no-hassle demo.

Mike Salib didn't have the feistiness of last year's Starkiller talk, but his "Stupidity and Laser Cat Toys: Indexing the US Patent Database with Python and Xapian" talk had nothing to do with sophisticated cats and everything to do with taking on the software patent cartel. His battle cry is, "The patents will kill us all; there's more of them than there is of us. They reproduce a lot. Sooner or later, people will die due to lack of access to patented technology."

The US patent database can be downloaded on the web, but only one patent at a time. Downloading more than a hundred per session is forbidden, but you can have all patents conveniently delivered to your door on tape for the low price of $30,000. Mike didn't have $30,000 so he opted to download them a hundred at a time in parallel from several computers at the lab of a university that shall remain nameless. The files come to several gigabytes compressed, which Mike was giving away on DVD to any who asked. He will soon have a website up at here.

Mike considered using pyLucene but it was too slow. (However, other projects at the conference are using pyLucene and are happy with it.) He chose Xapian because it works with compressed databases. There were many build issues, but Mike has written a library that will make it easier for others. I'm not sure if it's been released yet, though.

Anna Ravenscroft spoke on "The Time of Day": how to get Python to tell you the current time in any timezone. This was especially apt for her since she was in the process of moving from Italy to California, and had stopped in DC for the conference. Her poor little laptop just couldn't keep up with her jet-set lifestyle. Getting the time in Python 2.4 or 2.3 is simple:

$ python
Python 2.3.3 (#1, Aug 19 2004, 17:24:27)
[GCC 3.3.2 20031218 (Gentoo Linux 3.3.2-r5, propolice-3.3-7)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from datetime import datetime
>>> print datetime.now()
2005-04-01 11:48:54.392067
>>>
Formatting the time seems to fluster some people that try to use time.strftime() with datetime objects, but that's easy too:
>>> print datetime.now().strftime("%B %e, %Y %l:%M%P")
April  1, 2005 11:53am
For date calculations Anna recommends dateutil, and for a database of timezones pytz pytz.

Andrew Jonathan Fine described how he saved his company a million dollars by writing a Python - DocBook SGML - OpenJade - RTF converter to extract the text and structure from MS Word documents in diverse formats into a standard report format that was also in Word. That saved 7 FTEs over two years to manually format it, dwarfing the cost of researching DocBook, Word libraries, etc.

These are just a few of the talks presented at PyCon.

Open Space

Last year's Open Space suffered from a lack of promotion. People didn't realize it was happening, didn't look at the schedule to find sessions of interest, and couldn't remember what Open Space was supposed to be or how it differed from Birds of a Feather (BoF) sessions. This year Jim Fulton vowed to make it better, and he did. BoF's were collapsed into Open Space, and half the sessions on the first day were prescheduled so that at least some choices would be there. One room was devoted to Open Space for the entire conference (except keynote talks), and another room was devoted half time.

So what is Open Space? To do a regular talk at PyCon you have to present a paper to the vetting team who may or may not accept it. To do an Open Space you simply sign up on the schedule. Open Space sessions are smaller, typically 5-15 people. They may be a presentation or a roundtable discussion.

The interesting thing this year was the number and quality of Open Space sessions. Almost every time slot was filled, and many sessions were as content-ful and worthwhile as the main talks. In fact, some were main talks last year, repeated by popular demand.

Brett C. gave a particularly interesting Open Space talk on "How Python is Developed". The python-dev team is the implementation group, with 10-12 core developers. Eighty people have checkin rights, but many have not touched the code in years or only work on certain modules. New modules, by BDFL pronouncement, must first be widely adopted by the Python community before they can get into the standard library, must adhere to the BDFL's coding conventions, and the developer has to commit to maintaining it.

There are plenty of ways to contribute to the Python Core without being a high-level programmer. Post bug reports, run the regression test suite on various platforms, use the beta versions. Install patches posted on SourceForge, run the regression tests on them, check the code to make sure it looks OK, and post the results back to the patch thread. Martin van Löwis also has a deal for those who really want to get a certain patch into Python. Do five reviews of other patches, say you did on the python-dev list, and Martin will make sure to consider your favorite patch.

Lightning Talks

Another highlight this year was the Lightning Talks. These were spontaneously scheduled like Open Space, but limited to five-minute presentations. Some of the speakers weren't as polished as the prepared talks, but the content was nevertheless high quality. The first session had forty talkettes; the second around ten. Somebody came up with the simplest explanation of continuations I've seen:

>>> def foo():
...     a = 5
...    def bar():
...        print "The value is", a
...    return bar
...
>>> f = foo()
>>> f()
The value is 5
>>>
This is a continuation because a is defined in the surrounding scope, and even though it's a local variable in the enclosing function, it nevertheless remains alive when bar is called.

Other talks were on the need for a Money type subclassing Decimal, and active command line completion. rlcompleter2 works in the Python shell and ipython, and shows object names, function signatures, docstrings, and even source code.

Miscellaneous

Stephen Diebel held a Q&A on the Python Software Foundation, which holds Python's copyright, takes care of legal issues, and is a tax-deductible fundraiser. This year they've awarded $40,000 in grants to three projects: one to upgrade Jython's features to 2.4, another to revamp python.org to make it easier for people to contribute news, and another I didn't catch. Much of the money came from the proceeds of past PyCons. This will be more difficult to sustain as PyCon grows, because bigger venues mean bigger expenses, and a dud year could wipe out the surplus. But Diebel is pretty confident the grant fund will grow, and maybe in the future they can pay a couple of the core developers to work on Python full time.

The PSF members introduced themselves as "Uncle Timmy", "Nephew Jeremy", "Uncle Guido", "Just David", "Neil", "Martin", and "Steve". Somebody complained the PSF was America-centric, but Martin van Löwis pointed out that he is not American, and many of the grants have been going to other places. The EuroPython organizers asked for a PSF representative to come speak, since most EuroPython attendees don't know the PSF exists. (EuroPython will be in Göteborg, Sweden on June 27-29. Next year it will be at CERN in Switzerland.)

Plans For The Future

Congratulations to the honorable Steve Holden, who is retiring from PyCon chairmanship after several years and whose final stunt was pulling off the best conference yet. Andrew Kuchling has taken the baton for 2006, and gave an Open Space talk about his preparations. He noted how PyCon has spontaneously increased in size every year in spite of our pitifully lame attempts at promotion, and at 440 we've already maxed out the capacity of GWU. Next year we have to plan for 500-600. Since most attendees are cheapskates and won't pay more than $70 for a hotel room, we'll likely have to move out of DC to find an affordable venue that can accommodate future expansion for several years. The most likely location at this point is Baltimore. It will probably be someplace "near DC" since that's where many PyCon organizers live, and other regions haven't gotten off their butts to follow up on local venues.

The most common request on last year's feedback forms was more tutorial-level activities. Without much conscious thought, PyCon has positioned itself as a "research conference" where most talks are about cutting-edge projects. That's good for advanced users but doesn't meet the continuing need to train the new generations of users that are dabbling in Python. Suggestions include more talks on basic topics, repeating tutorials that people have written for their local groups, having intense (perhaps paid) tutorials before the conference, and classifying the talks as beginner/mid/advanced on the schedule. They're also considering a low-cost teenager track on the last day with basic topics, which has perked the interest of a couple local schools that use Python in class.

We made one attempt to fill in the tutorial gap this year, but it fell flat. I organized an Open Space called "Python Q&A". I was trying to do something like The Answer Gang here at the Linux Gazette, where people could bring any Python-related question, but we ended up with all answerers and no querents. We'll have to try a different approach next year, perhaps doing it right after a tutorial track.

In the sprint reports session, Andrew gathered suggestions for next year's sprints. Most people said four days was a good length but they should be after the conference rather than before, so that people could sprint on what they learned at the talks. PyCon would have to move to Monday-Wednesday to accommodate the sprints afterward; otherwise, people from overseas would have to take 1 1/2 weeks off work to attend them.

See y'all next year.

 


picture Mike is a Contributing Editor at Linux Gazette. He has been a Linux enthusiast since 1991, a Debian user since 1995, and now Gentoo. His favorite tool for programming is Python. Non-computer interests include martial arts, wrestling, ska and oi! and ambient music, and the international language Esperanto. He's been known to listen to Dvorak, Schubert, Mendelssohn, and Khachaturian too.

Copyright © 2005, Mike Orr (Sluggo). Released under the Open Publication license unless otherwise noted in the body of the article. Linux Gazette is not produced, sponsored, or endorsed by its prior host, SSC, Inc.

Published in Issue 113 of Linux Gazette, April 2005

<-- prev | next -->
Tux