Home > Software Development > Millions of lines of code

Millions of lines of code

December 20th, 2007

Steve Yegge attracts some attention with Code’s Worst Enemy, in which he uses many, many words to say that “big, bloated software sucks to maintain”. He talks about a 500,000 line Java-based game that he wrote that is unmanageable. Though he’s railing against code bloat in general, he does rant a bit about Java in particular. I’ll come back to Java…

First off, I agree with the basic idea that having too much code to manage in a single conceptual unit (a program) makes life very difficult.

The first defense against this is good product management. Only add features if they’re good for the product as a whole. If you strip out a lot of fringe features, you can end up with a much leaner product that is both easier for you to maintain and easier for users to get into using. Scribbles brings this to mind. The app is focused on creating drawings and provides a UI that is completely streamlined to the task. I would guess that the code for Scribbles is not out of control, because it would be similarly focused.

The next thing to consider is that we all routinely write 1 million+ LOC programs. Consider this case:


#!/usr/bin/python

print "Hello, world"

Conceptually, programs don’t get much easier than that. But, running that program and seeing its output in a Terminal.app window on my Leopard-based Mac running Python 2.5 undoubtedly touches at least 100,000 lines of code. The trick is that I have faith that Python, Terminal and Leopard are all going to do their jobs (and those big guys have faith in the libraries they use), so that I can effectively ignore 99,999+ lines of code and just focus on the one.

There were a couple of commenters on Steve’s blog post that suggested that maybe rather than having a 500,000 line program what he really should have is a 10,000 line program and a bunch of libraries. In one sense, this is the compartmentalization that Steve complains about in his post. Libraries are one effective way of compartmentalizing code that has proven to be very successful over the years.

And language features are another way to segment code. That’s part of what the recent hubbub over domain specific languages is all about. Libraries can reduce how much you have to think about as you solve a problem, and DSLs can further reduce this.

Which brings me back to Java and Steve’s original point. We can focus on reducing the amount of code we manage by using libraries, and there is certainly plenty of that going on in Javaland. We can also add higher-level features to our languages. Doing so reduces the total lines of code required to solve a problem and the cognitive load required to maintain a program. Here’s an example:


import time

class Foo(object):
    @property
    def foo(self):
        return time.time()

Python’s attribute and property handling means that you don’t need to define random setters and getters all over the place. You just define them where you need them. Think about how many lines of code in Java are wasted on getters and setters that do nothing beyond the standard behavior, and how many times you have to type bar.getFoo() rather than just bar.foo. Yes, I know that Eclipse generates the getters and setters for you, but you still have to read through all of that muck when you’re trying to figure out what a program does.

In a nutshell:

  • I agree that code bloat is a huge problem
  • I’m sure that part of the blame for having 500,000 lines of code to manage single handedly rests with Steve… in choices he made about features to include, and in how he decided to decompose the problem.
  • But I also agree that Java carries dramatically higher cognitive load than many other popular languages and one should really think about alternatives.

Software Development

  1. December 20th, 2007 at 17:37 | #1

    I had similar thoughts: Linux is big, GCC is big, the JVM is big, the C standard library is big, etc, etc, as you say. They get along nicely because they are well separated and have well defined, sharp, clear interface points. If Steve’s code doesn’t have that sort of thing, he has no one to blame but himself.

    (This kind of shines a spotlight, indirectly, on Windows. Windows is big, too. But it does not seem to have such well separated, clearly defined, and *well guarded* interface points. E.g. I’ve read that Windows will pre-load Word DLLs at system start just so Word will start up faster. I’ve described Windows as “incestuous” for a long time; I think it’s a good word for this sort of thing.)

    In a way, though, all this is related to his point but kind of aside from it. His point is, “I’m only one guy, and I have too much dirt”, and we’re telling him that if he puts his dirt in separate buckets, it’ll be easier to carry around. True, but irrelevant. He wants a language that allows him to do more with less. He wants *less total dirt*, regardless of the buckets, and regardless of how other *groups of people* carry around their multiple buckets of dirt.

    To put it another way, if *you* had to modify Linux, *and* GCC, *and* the JVM, *and* the C standard library, I bet you’d wish there were less of it all, regardless of how well they play in the same pond together.

  2. December 20th, 2007 at 23:10 | #2

    I agree. My main point is to use both a language and libraries with the least cognitive load for the problem you’re solving. I also believe in using lots of open source and contributing to/creating open source as a way to share more of the load with others that have similar infrastructure needs. Opportunities to open source stuff become more apparent if you have well defined boundaries.

  1. No trackbacks yet.