Blue Sky On Mars

Thoughts on Building Software Products

Google and online payments?

by Kevin Dangoor

If this story about Google’s plan to take on PayPal has any truth to it, this may indeed be a fine time to invest in GOOG despite the $70 billion valuation. Why? Because my guess is that if they enter that market they have original spin on online payments. I can’t imagine that they would just offer up a clone of PayPal. Everything else they’ve done to date has had a significant technological edge to it.

Update:

This is what I posted in response to Scott Sanders’ post about Google going after micropayments.

That’s interesting. I very nearly wrote on my blog that Google could implement a functional micropayments system.

I’m just not sure that that’s what they’ll go after though. I was reminded of Clay Shirky’s article (that actually goes all the way back to 2000!) arguing that micropayments are bad, because people will need to make a thousand little purchase decisions every day. Imagine if your typical blog was 1 cent per page viewed. 1 cent is nothing and seems easy to throw away… but if every blog you viewed did that, you probably would view a whole lot less and think about what you subscribe to more. Here’s Shirky’s article:

http://www.openp2p.com/pub/a/p2p/2000/12/19/micropayments.html

That said, I do think you’re right that if Google pushed some kind of identity system and reduced the general friction in payments somewhat, they could produce a lower-cost (to buyers and sellers) system. This may have the sideeffect of enabling micropayments, but I doubt that their pitch is going to be about micropayments.

I’m certain that they’ll keep a transaction log and that it’ll be no big deal for them. (I betcha they have an absolutely enormous amount of data being logged as it is.)

One other thing to note that I didn’t see in the article that I had linked to or here: this story apparently was in the Wall Street Journal, which gives it a modicum of credibility.

http://online.wsj.com/article_email/0,,SB111905141149263168-IdjgINhlaN4oJyobIKHbK2Bm4,00.html

(I got that link from /.)

ZODB vs. pysqlite with SQLObject

by Kevin Dangoor

I have found only two good options for an embedded, transactional database that can be conveniently used in a non-free Python application: the ZODB and sqlite. I started out with sqlite and then recently started doing some heavy duty tinkering with the ZODB.

Concurrency

For many desktop applications, sqlite is great. The data is all conveniently stored in one file and current versions of sqlite even eliminate the need for “vacuuming” up deleted data. Add SQLObject to the mix, and you can get fairly transparent access to the database once you’ve set up your classes.

The problem that I ran into with SQLite is with concurrency. Concurrency is not much of a problem for many desktop apps: you have one user at the computer doing work, and that’s it. For my application, the app is doing things in the background and updating the database while the user is using the software. This effectively brings in some of the usually issues you have in a multiuser app.

sqlite has a simple locking strategy: if anyone starts writing executing data update statements in a transaction, the whole database is locked. So, while the background process was updating the database, the GUI would get stuck waiting for it. Ick.

The ZODB, particularly ZODB 3.4, has a great concurrency picture. The current ZODB implementation uses multi-version concurrency control (MVCC). This means that rather than locking the data when a write is occurring, other threads will just get older data. The ZODB doesn’t do locking at all. Instead, of two threads try to make updates to the same object a Conflict exception is raised for the one that didn’t make it through. The great thing about this strategy is that you can often just capture that exception and replay the transaction at the application level. This is exactly what Zope does.

In the work that I’ve been doing with it of late, ZODB 3.4 really does handle concurrency beautifully.

APIs

The other great thing about the ZODB is that it is transparent. There are only a very few things that you would do differently using the ZODB than what you do normally in Python. To save an object to the database, you just need to attach it to an object that is already hanging somewhere off of the root of the database.

SQLObject is a great object-relational mapper (ORM). Once you set up your classes, using SQLObject is about as transparent as an ORM can get. The downside is that you’re still doing mapping so you can’t completely forget about the fact that there’s a relational database back there.

With the ZODB, you can store any objects you need to in there, and you can do convenient Pythonic things, like adding new variables to instances that are stored in the database.

Since it sits on a relational database, SQLObject gives you excellent querying capability out-of-the-box. With the ZODB, you need to use a “catalog” to do anything other than what is effectively a “primary key search”. The catalog maintains indexes of the objects based on whatever attributes you need to index and will run queries against those indexes. Using the persistent BTrees that come with the ZODB, these indexes are very easy to create.

Note, however, that you need to create these indexes yourself and keep them up to date. IndexedCatalog can provide a little more of the relational database ease-of-use for searching.

The ZODB comes with a full-text search index, which sqlite does not.

The downside of ZODB

The ZODB, particularly with MVCC, works great for the application for which is was developed: web sites. The biggest trouble that I have with it for a desktop application is the packing operation required by the FileStorage. FileStorage is by far the most used (and maintained) way to store the database data. I don’t believe there is presently a maintained storage that does not require packing.

The good thing about packing is that it can happen in the background. The bad thing is that packing a 250MB database file down to 195MB takes many minutes (on my 1.25GHz PowerBook) with the Python process grabbing as much CPU as it can. It’s fine for a website to do a pack during off-peak hours, but sucking up a desktop user’s CPU for a good long time is unpleasant.

The ZODB tends to grow more quickly than a relational database does. The ZODB stores pickles of the objects. If you look at how pickles are stored, everything from an object’s dict gets dropped directly into the pickle with both the key and value. This makes sense, because any instance can have any set of attributes in its dict. I noticed that using slots does not seem to gain you any extra space efficiency in your pickles. Since a relational database has no such flexibility in storage, it can be far more efficient in space. That 195MB ZODB I mentioned is about 75MB in sqlite. Luckily, disk space is cheap.

ZODB (with FileStorage) uses a bit more memory than sqlite. I believe the current figure is about 8 bytes per persistent object to keep track of where the objects are in the big .fs database file. This is not too onerous. 1 million objects means 8MB of memory used, and machines can readily handle that.

It’s just worth pointing out that the ZODB is more resource hungry than sqlite.

Conclusion

Isn’t computer science great? There’s very rarely a “perfect” solution to a problem. I figured I’d write this up to help anyone who might be thinking about how they want to store their data in their application. sqlite and the ZODB have some significant differences and choosing one over the other really means taking into account what is most important for your application.