Finding a user’s Documents folder on Windows and Mac

Update: I revised this to include the Mac way of finding the user’s documents directory. Perhaps the Python standard library should include a way to get at the best path for storing documents.

For Windows

Yuck. Coming within 10 feet of win32 programming, you can really appreciate the need for .NET and a sane API.

For anyone needing to find the user’s “My Documents” folder on Windows with Python, here is one way that I’ve found to work based on this MSDN article. (As awful as the API is, I will say that MSDN is a great resource!)

from win32com.shell import shell
df = shell.SHGetDesktopFolder()
pidl = df.ParseDisplayName(0, None,  
    "::{450d8fba-ad25-11d0-98a8-0800361b1103}")[1]
mydocs = shell.SHGetPathFromIDList(pidl)

I really wanted to use shell.SHGetFolderPath but I couldn’t get a combination that works for retrieving “My Documents”. In order to get My Documents from there, you need a PyHandle for the user, and I got the method above working before I found my handle… You can get “My Pictures” easily from shell.SHGetFolderPath, however.

from win32com import shell, shellcon
mypict = shell.SHGetFolderPath(0, shellcon.CSIDL_MYPICTURES, 0, 0)

I’m sure that if you do win32 programming all the time, you get used to this stuff. Knowing that doesn’t make me feel less sorry for the folks that are doing win32 programming.

For The Mac

The Mac (Carbon) way of doing this is similar to the SHGetFolderPath call on Windows. Both of these APIs make more sense than the one I went with (ParseDisplayName with that lovely GUID).

from Carbon import Folder, Folders
folderref = Folder.FSFindFolder(Folders.kUserDomain, 
    Folders.kDocumentsFolderType, False)
docs = folderref.as_pathname()

ZODB vs. pysqlite with SQLObject

I have found only two good options for an embedded, transactional database that can be conveniently used in a non-free Python application: the ZODB and sqlite. I started out with sqlite and then recently started doing some heavy duty tinkering with the ZODB.

Concurrency

For many desktop applications, sqlite is great. The data is all conveniently stored in one file and current versions of sqlite even eliminate the need for “vacuuming” up deleted data. Add SQLObject to the mix, and you can get fairly transparent access to the database once you’ve set up your classes.

The problem that I ran into with SQLite is with concurrency. Concurrency is not much of a problem for many desktop apps: you have one user at the computer doing work, and that’s it. For my application, the app is doing things in the background and updating the database while the user is using the software. This effectively brings in some of the usually issues you have in a multiuser app.

sqlite has a simple locking strategy: if anyone starts writing executing data update statements in a transaction, the whole database is locked. So, while the background process was updating the database, the GUI would get stuck waiting for it. Ick.

The ZODB, particularly ZODB 3.4, has a great concurrency picture. The current ZODB implementation uses multi-version concurrency control (MVCC). This means that rather than locking the data when a write is occurring, other threads will just get older data. The ZODB doesn’t do locking at all. Instead, of two threads try to make updates to the same object a Conflict exception is raised for the one that didn’t make it through. The great thing about this strategy is that you can often just capture that exception and replay the transaction at the application level. This is exactly what Zope does.

In the work that I’ve been doing with it of late, ZODB 3.4 really does handle concurrency beautifully.

APIs

The other great thing about the ZODB is that it is transparent. There are only a very few things that you would do differently using the ZODB than what you do normally in Python. To save an object to the database, you just need to attach it to an object that is already hanging somewhere off of the root of the database.

SQLObject is a great object-relational mapper (ORM). Once you set up your classes, using SQLObject is about as transparent as an ORM can get. The downside is that you’re still doing mapping so you can’t completely forget about the fact that there’s a relational database back there.

With the ZODB, you can store any objects you need to in there, and you can do convenient Pythonic things, like adding new variables to instances that are stored in the database.

Since it sits on a relational database, SQLObject gives you excellent querying capability out-of-the-box. With the ZODB, you need to use a “catalog” to do anything other than what is effectively a “primary key search”. The catalog maintains indexes of the objects based on whatever attributes you need to index and will run queries against those indexes. Using the persistent BTrees that come with the ZODB, these indexes are very easy to create.

Note, however, that you need to create these indexes yourself and keep them up to date. IndexedCatalog can provide a little more of the relational database ease-of-use for searching.

The ZODB comes with a full-text search index, which sqlite does not.

The downside of ZODB

The ZODB, particularly with MVCC, works great for the application for which is was developed: web sites. The biggest trouble that I have with it for a desktop application is the packing operation required by the FileStorage. FileStorage is by far the most used (and maintained) way to store the database data. I don’t believe there is presently a maintained storage that does not require packing.

The good thing about packing is that it can happen in the background. The bad thing is that packing a 250MB database file down to 195MB takes many minutes (on my 1.25GHz PowerBook) with the Python process grabbing as much CPU as it can. It’s fine for a website to do a pack during off-peak hours, but sucking up a desktop user’s CPU for a good long time is unpleasant.

The ZODB tends to grow more quickly than a relational database does. The ZODB stores pickles of the objects. If you look at how pickles are stored, everything from an object’s dict gets dropped directly into the pickle with both the key and value. This makes sense, because any instance can have any set of attributes in its dict. I noticed that using slots does not seem to gain you any extra space efficiency in your pickles. Since a relational database has no such flexibility in storage, it can be far more efficient in space. That 195MB ZODB I mentioned is about 75MB in sqlite. Luckily, disk space is cheap.

ZODB (with FileStorage) uses a bit more memory than sqlite. I believe the current figure is about 8 bytes per persistent object to keep track of where the objects are in the big .fs database file. This is not too onerous. 1 million objects means 8MB of memory used, and machines can readily handle that.

It’s just worth pointing out that the ZODB is more resource hungry than sqlite.

Conclusion

Isn’t computer science great? There’s very rarely a “perfect” solution to a problem. I figured I’d write this up to help anyone who might be thinking about how they want to store their data in their application. sqlite and the ZODB have some significant differences and choosing one over the other really means taking into account what is most important for your application.

ZCatalog for standalone ZODB

I have just packaged up ZCatalog from Zope 3.1 for use with the standalone ZODB. I’m surprised that no one had released this previously, because all but the most trivial of ZODB apps will need some way to do non-primary key sorts of searches.

Previously available solutions that I spotted are: [IndexedCatalog](http://www.async.com.br/projects/IndexedCatalog/) and [these instructions for getting Zope 2.6’s ZCatalog working](http://slarty.polito.it:8069/~sciasbat/wiki/moin.cgi/StandaloneZodbHowto).

The major feature that ZCatalog has over IndexedCatalog is a full-text search index.

I had originally packaged up the Zope 2.8 catalog. That was generally working, but I wasn’t completely comfortable with it because the code was fairly “tangled up”. The Zope bits were spread throughout the code. Zope 3 has a beautiful architecture, and extracting the catalog from there was far easier.

Note, however, that this means that current ZCatalog plugins ([Dieter Maurer’s AdvancedQuery](http://www.dieter.handshake.de/pyprojects/zope/#AdvancedQuery), or [TextIndexNG](http://www.zopyx.com/OpenSource/TextIndexNG), for example) won’t work directly. Hopefully, it will not be difficult to get these sorts of things running. Generally speaking, that is left as an exercise to the reader (patches accepted 🙂

I’m calling this release 1.0 alpha 0, because the testing that I’ve done is approximately as extensive as what you see in the readme file. Which is to say that it passes the most basic of sanity checks, but I haven’t looked much beyond that. I’ll be exercising it a lot more this week, so I’ll probably have a bit more confidence later.

When it comes to developer tools, though, release early, release often is a good motto.

Nicolas Lehuen’s ternary search tree

I’m keeping a bookmark of Nicolas Lehuen’s pytst ternary search tree library. I don’t need one right this second, but TST’s are very useful (look at Nicolas’ description to see why). It’s LGPL open source, written in C++. If I do find myself needing a TST, I’m likely to also need to use it with Unicode, which this one does not currently support. I’d have to figure that out when I get there…

Ruby on Rails wins the marketing war

For my current project, I have integrated a few different open source Python projects that give me power at least equivalent to that of Rails. The pre-packaged integration is only one part of it, though: the Rails guys are good at marketing their ideas. Not only are they good at marketing to Ruby audiences, but they also have done a great job of getting Java folks to write about it. Here’s an example: Ajaxian Blog: Ruby on Rails 0.11 includes native Ajax support.

Rails 0.11.0 is out on the street and I’m especially proud of the Ajax support we’ve been able to include. Instead of trying to soften the blow of doing client-side Javascript libraries as many others are doing, we’ve gone ahead and more or less removed the need for hand-written client-side javascript entirely.

This is done by off-loading the creation of DOM elements to the server side, which returns complete constructs that are then injected live through innerHTML. While this method is slightly more verbose than just peddling data across the wire, it’s immensely easier to develop.

For those who have been following Python web frameworks, you might remember that Woven and now Nevow have both offered a LivePage feature which does exactly this. It does more, in fact, allowing you to easily call the client side from the server whenever you want, and not based on an explicit request.

Subway might help, but for anyone who has read Seth Godin’s Purple Cow aiming to be “similar to Ruby on Rails” is not a likely way to gain mindshare. To be sure, Subway will help people who want to quickly put together a Python webapp, but don’t expect it to heavily increase interest in Python web development the way Rails has for Ruby. To do that, there would need to be startingly cool new features.

In one sense, Rails had it easy marketing-wise, because web development in Java is a pain in the butt. To outdo a dynamic system like Rails is more difficult.

Of course, I don’t need convincing, because I’m already a Python user. But, as far as convincing other people go, Python does have some advantages: Python apps can be nicely packaged up as Windows exes and Mac apps, generic functions are an important feature for certain types of problems, and Python is already entrenched in a number of places.

The title of this post is a bit over-the-top, I know. There is plenty of room for a variety of successful tools. But, successful marketing, more than anything else, has made Rails what it is today.

Generic functions in PyProtocols

There are certain situations where generic functions make a whole lot of sense. (If you find yourself writing a method or function with a bunch of if/elif/else cases, you might be looking at one of these situations.) This week, I had a clear cut case of rules that I wanted to combine in a certain way and to also allow extensions of the rules. The PyProtocols dispatch package made this straightforward.

I ran into a situation where two rules applied equally well. The standard combination mechanism throws an exception in this case, rather than making an arbitrary choice as to which should apply. I wrote my own combination function already, but I’m still glad to see a new page of documentation specifically about result combination: CombiningResults – The PEAK Developers’ Center.

Generic functions are a handy tool to have in your belt. Sure, anything you can do with them you could have done before. But, some problems feel like a class or interface is the right solution, other problems feel like a generic function is a better fit.

Cheetah template tip: you want _namemapper for Windows

I do most of my work on my Mac, hopping over to the PC for additional integration and build testing. I already have someone who is running Windows builds regularly, though, so any Windows-specific problems will get shaken out. I was testing out my latest build on Windows and found that it was very slow. I tossed a timer onto the page and found that it was taking 21 seconds to render. I added a database index I had been meaning to add, but I didn’t think that would do it because my database had a trivial amount of data.

That was when I remembered that the setup.py script for Cheetah assumes that you don’t have a C compiler if you’re running Windows. I’m working with enough modules that it was well worth getting a functional MinGW setup going. I changed Cheetah’s setup.py to enable the _namemapper C module.

The rendering time for that page dropped to 0.06 seconds, 350 times faster. It sure seems like you need _namemapper for any but the most trivial templates. Given that, I would probably make setup.py count on a C compiler first, and let the user comment out that line if they don’t have one.

If you’re considering using Cheetah under Windows, don’t call it “too slow” until you’ve tried it with C _namemapper.

Trolltech license change and eric3 for Windows?

With the news that Trolltech has decided to extend its dual licensing strategy to Windows, that means that Windows users will be able to use Qt-based GPLed software for free. That’s great news, because many people use Windows for a variety of reasons, and there are many good Qt-based programs that can now be run there for free.

I use a Mac primarily, but I also use Windows regularly. So, I’ve never given the eric3 IDE a try, because I couldn’t use it cross platform. With the GPL release of Qt for Windows, maybe I’ll be able to. It looks like there is still a question of whether Riverbank Computing will follow suit with a license change for PyQt.