Python dictionaries are not the same as instances

Yes, yes, I know that Python dictionaries are instances of the dict class. Something that I have noticed through the years and in a variety of bits of code is that many people will use a dictionary where they should really be using an instance. Many of the people that do this likely came from PHP4, Perl or even C where classes are either non-existent or are a little uncomfortable to use.

Dictionaries are great if you need to pass around a few values in one limited place in a larger system. Custom class instances are much better, though, if you have some data that is going to be used in many places in your code.

This is not a rant about “you should be doing OO design”. As we’ll see in a moment, I’m not even writing about OO design. I’m advocating using custom class instances instead of plain dictionaries because Python is a class-based object oriented language. That means that the language specifically provides benefits to people using classes.

First, I should make it clear what exactly I’m talking about. I’ll use the simple example of a person. A person could be represented with a dictionary:

kev = dict(first_name="Kevin", last_name="Dangoor", address="1600 Pennsylvania Avenue")

A person can also be represented by a class:

class Person(object):
    def __init__(self, first_name, last_name, address):
        self.first_name = first_name
        self.last_name = last_name
        self.address = address

kev = Person("Kevin", "Dangoor", "1600 Pennsylvania Avenue")

Some people would probably jump out and say that the second example is a lot more verbose and doesn’t let you store whatever data values you want as you can in the first example. The thing is that you only define the Person class once. The instantiation of a Person is just as easy as the instantiation of a dict. And, if you want to make the class smaller and support an unlimited number of attributes, you could do this if you really want:

class Person(object):
    def __init__(self, **kw):
        self.__dict__.update(kw)

When you use an instance of Person rather than the dictionary, you get several benefits:

1. Lazily computed values.

If you want to add full_name to Person, you just do this:

class Person(object):
    def __init__(self, **kw):
        self.__dict__.update(kw)

    @property
    def full_name(self):
        return self.first_name + " " + self.last_name

kev = Person(first_name="Kevin", last_name="Dangoor")
kev.full_name

Just like that, every Person instance now has a full_name property. To do the same with a dictionary would require adding full_name to the dict manually.

2. Backwards compatibility.

Python properties allow you to override attribute-style access with a method call. Let’s say that our Person class changed to maintain a full_name instead of separate first and last names. We could still make first_name work, though:

class Person(object):
    def __init__(self, full_name):
        self.full_name = full_name

    @property
    def first_name(self):
        # warning... no error checking!
        return self.full_name.split(" ")[0]

kev = Person(full_name="Kevin Dangoor")
kev.first_name

If you are creating a given data structure in many different spots in your program, this can be a problem if you’re using a dictionary and need to add a new key/value pair. With a class, you can just do this:

class Person(object):
    age = None

And now, everywhere a Person is used you know you’re not going to get an AttributeError for person.age. If you use dictionaries everywhere, you’ll get a KeyError if you try to access ‘age’ on a dictionary created by older code.

3. Deprecations.

Even better than just providing backwards compatibility is to issue a deprecation warning so that you can clean up uses of the old style.

import warnings

class Person(object):
    def __init__(self, full_name):
        self.full_name = full_name

    @property
    def first_name(self):
        warnings.warn("Use full_name or else!", DeprecationWarning)
        return self.full_name.split(" ")[0]

kev = Person(full_name="Kevin Dangoor")
kev.first_name

4. You can add behavior.

Unlike my other points here, this one is an OO design thing… There’s not much difference between

hire(person)

and

person.hire()

For many types of behaviors, though, I think it’s handy to have the behavior on the object you’re trying to act on. This potentially saves you from extra imports. Consider this case:

"foo bar".split()

import string
string.split("foo bar")

These two are equivalent, but you don’t need to import string or have the ‘string’ name muddying up your module namespace.

5. Less typing.

This is not a good reason for a choosing a programming technique, but it is a fact:

person["first_name"]

is more typing than

person.first_name

6. Higher level features.

Custom class instances have the advantages I note above, just using them as basic data containers. Classes also offer inheritance and metaclasses which allow you to share attributes and behavior in a way that a plain dictionary can’t. Classes also give you operator overloading, customizable attribute access and the ability to customize the str and repr forms of the objects.


There’s really very little downside to using custom classes and quite a few benefits, particularly if you’ve got a system that’s going to grow. If you tend to reach for dictionaries whenever you need to store some data, you might consider whipping up a quick class instead.

17 thoughts on “Python dictionaries are not the same as instances”

  1. Not necessarily. Mainly because you may never need 1 to 6. In some programs I like to start off storing data in plain, but not necessarily simple, nested combinations of dicts,lists,sets, and tuples that match the data coming in; then think of what data structure would best be used to store my output as well as the algorithm used to translate input to output data (also starting in non instancce terms).
    This gives me the flexibility of using simple pprint statements to monitor the data throughout the algorithm.

    Sometimes I’ll think that it’s best to change to objects. Sometimes the data stays object-less from beginning to end. I just love that Python gives me that flexibility. (P.S. Some problems do scream “Use Objects” from the start though).

    – Paddy.

  2. Paddy: What? You don’t want to type less? 🙂

    More seriously, though, if you’re talking about a smallish, single module sort of program, sure… you don’t need those things. But, if you know that what you’re working on is going to be a bigger program and you know that the data structure you’re passing around is an entity of your program and not just a transient bit of data passed between a couple of functions, then a class seems like a good idea. Keep in mind also that you can do this (whitespace is almost assuredly going to get clobbered here, but you’ll get the picture):

    class DataThing(object):
    def __init__(self, **kw):
    self.__dict__.update(kw)

    If you use DataThings instead of dictionaries, all of your code will at least be written for attribute access rather than array-style access. That makes it easier to switch to a class later and saves on typing.

  3. Ahh, but there’s nothing like creating lists of dicts mapping strings to sets of strings and turning it into a list of strings that just happens to to be the result of parsing VHDL code in multiple libraries and generating the source compile order!
    (VHDL as a language needs to have source files compiled after their dependants, and things get hairy when you have interdependent libraries of sources).
    And I love adding ‘from pprint import pprint as pp’ then pp(some_inst_less_large_data) into a file and then perusing the data in vim.

    Classes are good too.

    – Paddy.

  4. You forgot to mention what, IMO, is the biggest advantage of using class instances over dicts to store data: instances implicitly carry information on what kind of data is stored in them. Eg:

    Person(name=”Candy”, age=23) vs. Cat(name=”Candy”, age=12)

    If those were to be stored in plain, meaninless dicts… how would you distinguish the cat from the person? (without adding an artificial “kind” key) I find this essential whenever objects are passed around between different components of a program in order to maintain my sanity. For example:

    a = datetime(2007, 12, 1)
    b = {‘year’:2007, ‘month’:12, ‘day’:1}

    then those get passed around and we need to display that date… would you rather:

    if hasattr(obj, “strftime”):
    print obj.strftime(“%Y/%m/%d”)

    or

    if hasattr(obj, ‘year’) and hasattr(obj, ‘month’) and hasattr(obj, ‘day’):
    print “%d/%d/%d” % (obj.year, obj.month, obj.day)

    What if somewhen in the future you want to diplay the time too? which is more maintainable?

    @paddy:

    Regarding the pprint statement to monitor data… isn’t that what the __repr__ method is for? Advantage of the later over the former: you don’t even need to import pprint 😉

    Alberto

  5. Excellent advice, but I do have one small nit to pick. If you want to replace what was previously a stored value with a computed one, as in your point 2, you have to do it by overriding __getattr__. If you just make a method with the same name, you have to change other code to call that function so it’s not entirely transparent.

  6. Well, besides not being backward-compatible, the example as given doesn’t seem to work. Doesn’t there need to be a nested fset etc.?

  7. I’ve made most of the examples more complete… I think I was a little too brief with them. Try them now, and I think you’ll see that it is backwards compatible and quite convenient to use…

    You only need fset if the property should be read/write. If it’s read only, just using property as a decorator works well.

  8. “Yes, yes, I know that Python dictionaries are instances of the dict class”

    They are?

    >>> instance = object()
    >>> isinstance(instance, dict)
    False

  9. Actually what I said was this:

    >>> foo = {‘bar’:’baz’}
    >>> isinstance(foo, dict)
    True

    My point was that while I’m saying that dictionaries are not the same as instances, I do recognize that they are instances of a class themselves.

  10. One thing I do like about dictionaries is that it is easy to iterate over their keys and I can’t think off the top of my head how you achieve the same thing with an instance.

  11. “Actually what I said was this”

    Sorry, misread the intro. Embarrassing. Feel free to remove my comment.

  12. @Ed: That’s an interesting point. I wonder if the Cookbook has a function to do that with instances… it can certainly be done, but it’s a little complicated.

    In real life usage (not debugging/testing), this is not generally a factor, though, right?

    @Observer: no worries… happens to everyone at one time or another.

  13. @tazzzzz,

    Debugging and Testing is real life use! Testing in particular.

    Most of my python is in using TurboGears, and I seem to spend a lot of time passing dictionaries around. I do really like your idea, but I would probably end up reimplementing a lot of the functionality of dictionaries on an instance.

Comments are closed.