Monday, December 29, 2008

Education vs. Experience

Joel Spolsky set off a minor flame war with a comment on his discussion group yesterday. It started off as a simple reply to Bob, who is thinking of leaving the software industry, but it then spilled over to reddit, where things can quickly get out of hand and off-topic.

For those of you who would like a summary, Joel takes the position that anyone who thinks they should leave the software industry, probably should. I tend to agree with this, but it's not what I want to write about in this post, partly because I don't want to be accused of being a Joel fanboy (even though the fact that I went to a Java school should immunize me from such an accusation), and partly because I'm much more interested in one of the side discussions that flared up.

The side discussion that I'm interested in is probably one you've heard before. It's about the merits of going to school and getting a CS degree vs. being a self-taught "hacker". The reason this discussion so fascinates me is that I've been on both sides of it, and I feel like I still understand both sides.

The argument against getting a CS degree always starts out reasonable enough. Here's a quote from the reddit comments:
In my view, a computer science degree doesn't predict whether a person is a good programmer or not.
This may be true, but the point is that it's a better indicator than not having a CS degree. Think of all the people you know without a CS degree. How many of them are great programmers? Maybe a few, but it's going to be a tiny proportion. Now how about those with CS degrees? Still only a few of them are probably great programmers, but the proportion is going to be much higher.

The same commenter goes on to say:
A lot of people with Computer Science degrees have a tremendously hard time realising for themselves that the degree they've got is probably worthless. There's some serious cognitive dissonance there.
Maybe that's because a CS degree isn't worthless? I understand that there may be a bias, but if most people who get a CS degree think that it was worth it, how can someone without a CS degree disagree? On what can they base their argument? Their own lack of a CS degree? If you haven't gotten a degree, then you can't know what it's worth.

The commenter then says:
In short, a degree leads a candidate to think they actually know something in much more depth than they actually do. In pretty much any area of computer science you can understand the subject to a much higher standard with a week of personal study than they achieved with three years at a university.
This is patently false. The little bit of knowledge you gain from personal study is much more likely to lead to second-order incompetence. The one thing that I learned better than any other when I went back to school to get a CS degree is how much more there was for me to learn. It didn't lead me to an inflated sense of my own knowledge, it led me to understand how truly ignorant I was.

A different reddit commenter had this to say:

I don't have a computer science degree, and I believe a degree generally shows the following:

  • You decided to recognize how the system works
  • You can stick with something for four years
  • You know how to look up an answer

That's pretty much it.

I agree with the first two points. I went to school with plenty of people who had years of experience and were only in a CS program in order to "get a piece of paper" corroborating their knowledge. They went back to school only to prove they could stick it out for four years, and they had recognized that they needed a degree to get a much deserved promotion. But I always felt that these people were cheating themselves by just showing up to get the degree. Many of them didn't apply themselves as hard as the could, and if they had given it a chance they may have realized, as I did, that there was a lot more to gain than just the "piece of paper".

I think the whole argument boils down to this: Every person with a CS degree used to be a person without a CS degree. If most of us agree that we're better off after having gotten a degree, then how can those without CS degrees be unconvinced? They haven't experienced the argument from both sides, so are by definition in an inferior position to argue.

I don't think that the argument should be that one programmer with a CS degree is better than another programmer without a degree. The real argument to make is that a programmer with a degree is better than he was before he got it. If you don't have a degree, you just don't know how much better you could be if you put in the time and the effort to earn one.

Sunday, December 28, 2008

Should I Learn C or C++?

The question comes up quite often, "Should I bother to learn C, or go straight to C++?" Many beginning programmers wonder if it's worth it to spend the time learning C, when they know a more advanced language is readily available. Others wonder if they're going to be missing out on anything important if they skip C, and proceed directly to C++ without passing Go or collecting $200.

While it is true that well-written C code will normally compile under C++, C is not a proper subset of C++. There are a few differences that can make a valid C program invalid in C++. The easiest examples to illustrate involve the addition of keywords such as new and class to C++, which were not reserved words in C. The following (contrived) valid C program will not compile as C++.
int main(void) {
int class, new; // both class and new are C++ keywords
printf("Enter two integers > ");
scanf("%d %d", &class, &new);
printf("The two numbers are: %d %d\n", class, new);
printf("Their sum is %d\n", class + new);
}

Another difference that's often cited between C and C++ is that C supplies an implicit cast when a void pointer is assigned to a pointer of a specific type, while C++ requires an explicit cast. The following valid C code will not compile using C++:
void* ptr;
int *i = ptr;
In order to make this code valid C++, an explicit cast to an int pointer must be supplied:
void* ptr;
int *i = (int *) ptr;
While it's important to understand that C and C++ are really two separate languages, it's just as important to understand that the parts of C that aren't valid C++ are extreme edge cases. C++ was originally intended to be just an extension of C (Stroustrup started out calling it C with Classes), so an effort was made to ensure that valid C syntax was broken in only a very few places. As Scott Meyer points out in Effective C++, C++ is really a federation of related programming languages: C, Object-Oriented C++, Template C++, and the STL. Almost all valid C programs will compile as C++, with very little, or often no changes necessary.

Most programmers who are deliberating between learning either C or C++ should probably skip C and learn C++. Start out with a book like C++ Primer and you will learn good programming style, not only in the C subset, but in all of the parts of the C++ language. Unless you plan on doing some work on the Linux kernel or another project that you know uses C, the only thing you will be really missing by not learning C first is Kernighan & Ritchie's C Programming Language. You can always go back and read K&R after you've taught yourself good C++ style and habits.


Further Reading

Wikipedia, Compatibility of C and C++.

Monday, December 22, 2008

Even More Free Programming Books

In a previous post I listed a few programming books that I found available for free online. Since then, I've been scouring the net looking for even more free programming books. Here's what I've found. (Note: As before, I haven't read all of these, so don't take this as a list of informed endorsements. Besides, they're free. You can read them yourself.)

  1. Practical Common Lisp
  2. Data Structures and Algorithms with Object-Oriented Design Patterns in C#
  3. Squeak by Example
  4. The Art of Unix Programming
  5. Algorithms
  6. Practical PHP Programming (Wikibook)
  7. Foundations of Programming

It just occurred to me how eclectic this list is. It's not every day that you see books on Lisp, C#, and PHP collected in the same list. The Internet is as wide as it is wonderful. :)

Additional Note: As some alert readers over on reddit noticed, the online version of Programming Pearls is incomplete, so I've decided to replace it on my list. Sorry I didn't catch this sooner, but you can still enjoy the 3 sample chapters that the author has graciously made available for free.

And here, for those of you who don't read blog comments, are a few bonus picks submitted by readers of my previous post.
  1. Higher-Order Perl
  2. On Lisp
  3. Starting Forth
  4. Thinking Forth
  5. Learning J
  6. Basics of Algebra and Analysis for Computer Science
  7. Concepts, Techniques, and Models of Computer Programming (PDF)
  8. .NET Book Zero
As always, I'm interested in hearing from you about even more free online programming material, whether they be in the form of books, tutorials, video lectures, etc...

Saturday, December 20, 2008

The Monty Hall Problem

The Monty Hall Problem is a probability puzzle based on a game that contestants played on the popular old TV game show Let's Make a Deal, hosted by Monty Hall. On the show, contestants were given a choice of three doors. Behind one door is a car, behind each of the other two, goats. The contestant got to choose a door, and won whatever was revealed to be behind that door. The twist was that after the contestant had selected a door, Monty would show them what was behind one of the two doors not selected, and give them the option of changing their mind.*

The game was deceptively simple, in that it seemed as though the contestant always had one chance in three of winning the car, whether they switched their choice or not. There was a strategy that would double the odds of winning though. It turns out that the winning strategy is as simple as the game.

All a contestant had to do to double their odds of winning was always switch their guess after Monty opened one of the doors. This simple strategy worked because the contestant had only one chance in three of guessing the right door initially. Once they had made their guess, Monty would open one of the other doors that did not conceal the car. Since the player's initial guess was wrong 2/3 of the time, Monty was opening the only other losing door 2/3 of the time (the car was behind one door, so he wouldn't open that).

This strategy isn't intuitive to a lot of people. In fact, when the strategy was first published in Parade magazine, thousands of people wrote in to claim the solution was wrong. It becomes more clear if you look at a bigger variation of the same problem.

Imagine you're on a game show where there are 100 doors. Behind one of them is a new car, while the other 99 conceal goats. Your odds of selecting the right door are 1 in 100. Now imagine that after you've made your selection, the host of the show opens 98 of the doors, revealing 98 of the goats. (Keep in mind that he doesn't randomly open doors. He knows where all the goats are.) Would you switch your original guess to the one remaining door? You only had a 1% chance of winning with your original guess, so does it seem advantageous to switch after the choices are narrowed down to only two?

With a bigger set of doors to choose from, it becomes much easier to see that you have a distinct advantage when you switch from your initial guess. The same holds true for the original problem. By always switching after one door was opened in the original game, the odds of winning were improved from 1/3 to 2/3. That's certainly not a sure thing, but it's not bad odds on a brand new car.

Additional Note: If you want to try out an online interactive version of the original game, you can play it here. Make sure you give yourself enough trials to confirm that the optimal strategy wins 2/3 of the time.



* It turns out, Monty Hall never offered to let contestants switch which door (or curtain) they picked on Let's Make a Deal. Instead, he would offer them cash to opt out of the game entirely. The problem that would be known as the Monty Hall Problem was originally published as the Game Show Problem by columnist Marilyn vos Savant in 1990.

Thursday, December 18, 2008

Hints or Solutions?

There was a mild debate on Stack Overflow today regarding the posting of solutions to the puzzles on other sites like Project Euler. The debate was started by this question. Several people disagreed with the questioner, but I'm on the fence.

I love sites like Project Euler, Top Coder, and the Python Challenge for the puzzles they provide. At least 90% of the enjoyment I get from them is in solving the problems myself, with no outside help. I have to admit though, that at least a small part of the enjoyment is in competing with other people who are trying to solve the same puzzles.

I don't mind at all when people give hints (especially the non-programming hints that seem to be required to advance to the next level of the Python Challenge). I do get a little bit annoyed, though, when I see people posting entire solutions to the puzzles from other sites. Like it or not, competition is a component of the enjoyment that people get from programming puzzle sites like these. Having others get full solutions to the puzzles removes that part of the challenge.

I know that there's little that the administrators over at Stack Overflow can do about people posting solutions to puzzles on other sites. People can easily rephrase a question so that it isn't obviously a Project Euler puzzle (for example). I also know that the solutions to many of these problems can be found with a quick Google search.

That leaves it up to the Stack Overflow community. As I said, I see nothing wrong with posting questions asking for hints and tips to questions from puzzle sites. I'd like to see more of them. What I do see wrong, is when people post full solutions to these problems. Having a full solution takes away the learning experience that one might have otherwise enjoyed. I think that taking away a learning experience goes against the spirit of both Project Euler and Stack Overflow.

Tuesday, December 16, 2008

Google is Visually Impaired

What real difference does it make if you use HTML tables instead of CSS to control the layout of your Web page? If you're a professional Web designer you probably already know, and you can stop reading now. Thanks for stopping in. For the rest of you, the title of this article is a hint.

The truth is, HTML tables are a lot easier to control. I can control every aspect of the appearance of a table using very little markup, and as a bonus it looks pretty much the same across all the major browsers. And I'm not even a Web developer. This seems like an easy choice. I don't want to have to learn about liquid layouts. Absolute positioning sounds like a good thing, but then I have to learn about something called the box-model hack? No thanks. Tables are fine. Tables are easy.

But what about Usability? I remember taking a course in college about that, and reading this really cool book by a guy named Don Norman called The Design of Everyday Things. It's about how simpler design is better, and that's why the iPod is so awesome. That goes right along with designing Web pages with tables. They're simple, right?

Not so fast. I also remember Norman saying something about how design should be simple from the perspective of the user. There's a trap here that's easy to fall into. You don't want to make a design decision base on how easy it is to implement, you want to make the choice based on how easy the design is to use. We've already established that tables are a lot easier on the developer, but how could they make a difference to the user? They're just reading the information off the page, right?

Well, most of them are. Some people have to use screen readers, and screen readers don't read tables in the same order as they read the elements in a CSS layout. It's a really small percentage of internet users, though, and you can't even find accurate statistics because all of the studies lump everyone with a disability in to one big group in order to inflate the numbers. It's really small, something like 10%* of internet users are using some sort of assistive technology like a screen reader. So for some small percentage of internet users, probably less than 10%, using a table-based layout will provide a less than ideal user experience.

Visual impairment is really random, though, so that means that about 10% of virtually any of internet users are using some sort of assistive technology like a screen reader. So for some small percentage of internet users, probably less than 10%, using a table-based layout will provide a less than ideal user experience. Can you really afford to turn away a random 10% of your potential Web audience? If you can, keep reading.

Google is visually impaired. (Let that sink in.)

Even if you don't care about users with accessibility issues (and you should), there's one user that everyone should care about. The web crawler that Google uses to scan the internet and index your site can't really see the pages. It reads them just like a screen reader would. That is, it reads pages from top to bottom, just like the screen readers are programmed to. They can't tell the difference between a table used for layout and a table used to hold tabular data, so they treat them both the same. HTML tables are meant for tabular data, so that's how screen readers and the Google web crawler read them.

Table-based layout is quick and easy. CSS layouts are complicated and hard. Does it really matter if you choose tables? Yes, more than most people realize.

* Really? Ten percent? No, I can't back that up. Even if it's only 2%, though, Google is still in that 2%.

Sunday, December 14, 2008

Freely Available Programming Books

In an earlier post, I listed some frequently recommended computer programming books. I got some good feedback from that post, so I thought I'd go a step further and provide some links to programming books that I've found for free online.

Please note that I'm going against my own previous advice here, as I haven't finished reading every one of these books from cover to cover (I haven't even started SICP, I admit), so don't take these as endorsements from someone who has read the books.

NOTE: One Anonymous poster pointed out that there's more material missing from the Google Book Search selections than at first appeared. I've had to remove those books, but you can search their selection through the link provided.

Without further ado, I present to you the 7 best free programming books that I could find online. Enjoy.
  1. Structure and Interpretation of Computer Programs
  2. Thinking in Java (3rd ed.)
  3. Thinking in C++ (2nd ed.)
  4. Algorithms and Complexity
  5. How to Think Like a Computer Scientist(Learning with C++)
  6. Programming Ruby (1st ed.)
  7. Dive Into Python
Please let me know if you find any of these links useful. I'm particularly interested to learn what people think of SICP, since it's been on my "to read" queue for quite awhile. If you decide to read it, be sure to check out the SICP companion video lectures. (Update: I've started reading SICP since this was first published, and I'm keeping track of my progress starting with The SICP Challenge.)

I'd also certainly be interested to learn of any other free online book resources that anyone can recommend.

Related posts:

Even More Free Programming Books

Wednesday, December 10, 2008

Books Programmers Don't Really Read

Mark Twain once said that a classic novel is one that many people want to have read, but few want to take the time to actually read. The same could be said of "classic" programming books.

Periodically over on Stack Overflow (and in many other programming forums) the question comes up about what books are good for programmers to read. The question has been asked and answered several times, in several different ways. The same group of books always seems to rise to the top, so it's worth it to take a look at these books to see what everyone is talking about.

Books Most Programmers Have Actually Read
  1. Code Complete
  2. The Pragmatic Programmer
  3. C Programming Language (2nd Edition)
  4. Refactoring: Improving the Design of Existing Code
  5. The Mythical Man-Month
  6. Code: The Hidden Language of Computer Hardware and Software
  7. Head First Design Patterns
  8. Programming Pearls
  9. Effective Java (2nd Edition)
    or Effective C++
  10. Test Driven Development: By Example

I've read all of these books myself, so I have no difficulty believing that many moderately competent programmers have read them as well. If you're interested enough in programming that you're reading this blog, you've probably read most, if not all of the books in this list, so I won't spend time reviewing each one individually. I'll just say that each of the books on the list in an exceptional book on its respective topic. There's a good reason that many software developers who are interested in improving their skills read these books.

Among the most commonly recommended programming books there is another group that deserves special consideration. I call the next list "Books Programmers Claim to Have Read". This isn't to say that no one who recommends these books has actually read them. I just have reason to suspect that a lot more people claim to have read the following books than have actually read them. Here's the list.

Books Programmers Claim to Have Read
  1. Introduction to Algorithms (CLRS)
    This book may have the most misleading title of any programming book ever published. It's widely used at many universities, usually in graduate level algorithms courses. As a result, any programmer who has taken an algorithms course at university probably owns a copy of CLRS. However, unless you have at least a Masters degree in Computer Science (and in Algorithms specifically), I doubt you've read more than a few selected chapters from Introduction to Algorithms.

    The title is misleading because the word "Introduction" leads one to believe that the book is a good choice for beginning programmers. It isn't. The book is as comprehensive a guide to algorithms as you are likely to find anywhere. Please stop recommending it to beginners.

  2. Compilers: Principles, Techniques, and Tools (the Dragon Book).
    The Dragon Book covers everything you need to know to write a compiler. It covers lexical analysis, syntax analysis, type checking, code optimization, and many other advanced topics. Please stop recommending it to beginning programers who need to parse a simple string that contains a mathematical formula, or HTML. Unless you actually need to implement a working compiler (or interpreter), you probably don't need to bring the entire force of the Dragon to bear. Recommending it to someone who has a simple text parsing problem proves you haven't read it.

  3. The Art of Computer Programming (TAOCP)
    I often hear TAOCP described as the series of programming books "that every programmer should read." I think this is simply untrue. Before I'm burned at the stake for blasphemy, allow me to explain. TAOCP was not written to be read from cover to cover. It's a reference set. It looks impressive (it is impressive) sitting on your shelf, but it would take several years to read it through with any kind of retention rate at all.

    That's not to say that it's not worthwhile to have a copy of TAOCP handy as a reference. I've used my set several times when I was stuck and couldn't find help anywhere else. But TAOCP is always my reference of last resort. It's very dense and academic, and the examples are all in assembly language. On the positive side, if you're looking for the solution to a problem in TAOCP (and the appropriate volume has been published) and you can't find it, the solution probably doesn't exist. It's extremely comprehensive over the topic areas that it covers.

  4. Design Patterns: Elements of Reusable Object-Oriented Software (Gang of Four)
    Design Patterns is the only book on this list I've personally read from cover to cover, and as a result I had a hard time deciding which list it belongs on. It's on this list not because I think that few people have read this book. Many have read it, it's just that a lot more people claim to have read it than have actually read it.

    The problem with Design Patterns is that much of the information in the book (but not enough of it) is accessible elsewhere. That makes it easy for beginners to read about a few patterns on Wikipedia, then claim in a job interview that they've read the book. This is why Singleton is the new global variable. If more people took the time to read the original Gang of Four, you'd see fewer people trying to cram 17 patterns into a logging framework. The very best part of the GoF book is the section in each chapter that explains when it is appropriate to use a pattern. This wisdom is sadly missing from many of the other sources of design pattern lore.

  5. The C++ Programming Language
    This book is more of a language reference than a programming guide. There's certainly plenty of evidence that someone has read this book, since otherwise we wouldn't have so many C++ compilers to choose from.

    Beginning programmers (or even experts in other languages) who want to learn C++, though, should not be directed to The C++ Programming Language. Tell them to read C++ Primer instead.

As I said before, I know there are a few of you who have actually read these books. This post isn't intended for you, it's intended for the multitudes who are trying to appear smarter by pretending to have read them. Please stop recommending books to others that you haven't read yourself. It's counter productive, as often there is a better book (more focused on a specific problem domain, easier to understand, geared more toward a specific programming language or programming skill level) that someone more knowledgeable could recommend. Besides that, you may end up embarrassing yourself when someone who has actually read TAOCP decides to give you a MMIX pop quiz (if you don't know what I'm talking about, then this means you).

Friday, December 5, 2008

Pair Programming

I had the chance to do some pair programming this week and I thought I'd share my impressions. For those of you who may have just recovered from a recent coma, pair programming is a technique where two programmers collaborate on a piece of code using one keyboard and monitor. The person typing is called the driver, and the person observing is called the navigator.

I entered in to my pair programming arrangement as not exactly skeptical, but I wouldn't say I was convinced by all I had read. I'd need to spend some more time coding with a partner (or see some citations) before I'm prepared to accept everything the Pair Evangelicals claim. Regardless, I was just curious enough to give it a chance and see if it could work.

The thing that I noticed within the first hour was that having a navigator was much less of a distraction than I had feared. I was worried that it would devolve into an unproductive chat session, but that didn't happen. There was a bit of chit-chat now and then, but each time within a few minutes one of us would remind the other of the mountain of work we had to do. Having a partner wasn't a distraction at all, it actually kept me slightly more focused than normal.

Another thing that struck me fairly quickly was that having a second pair of eyes on my code while I was writing it kept me honest. I took fewer shortcuts, commented my code properly, tested more, and refactored more often. All of the things you're supposed to do even when no one is watching. It was like having a code review really early in the project.

When it was my turn to navigate I noticed something else. I think differently when I'm not the one typing in the code. Having my hands on the keyboard forces me into a very logical mode of thought where I'm worried about details. Observing someone else writing code allowed me to think at a higher level of abstraction. You may have experienced this yourself when you're designing on paper, at a whiteboard, or writing pseudocode. Not worrying about the details of what will make your code compile allows you to think more about design-level aspects of your software.

Overall I would rate my experience with pair programming as a success, but I don't think that it's strictly necessary. If you haven't tried it, I would recommend it for a few days just to see what it's like. It has been worth it for me just to note the difference in my own way of thinking when I'm away from the keyboard. This difference will likely cause me to take more frequent breaks from typing so I can spend time at the whiteboard thinking about my software from a different point of view, even when I'm coding solo.

Friday, November 28, 2008

On Learning

I've been reading Pragmatic Thinking and Learning by Andy Hunt (who co-wrote the The Pragmatic Programmer) lately. The book brings up idea of second-order incompetence, which means that a novice at a particular skill is so unskilled that they don't even realize how unskilled they really are. This leads beginners to greatly overestimate their own ability. In many cases, it can lead a beginner to be more confident in their own skill than an expert.

The second-order incompetence theory applies to any skill in general, but I've noticed that it applies extremely well to programmers and their particular set of skills.


I've heard some programmers say that it takes anywhere from three days to three months to learn a new programming language. I personally feel that this is an extremely low estimate (I've been learning C++ for several years). Three days is barely enough time to learn the syntax of a new language. Three months is likely enough time to learn the syntax and a few common libraries of a language, but it isn't really enough time to get you beyond the "Advanced Beginner" skill level. It's probably just enough time to give you the confidence you need to be extremely dangerous.

The book goes on to explain that it really takes more on the order of ten years to truly master any non-trivial skill, whether it be playing chess, playing music, or flying an airplane. Peter Norvig quotes some of the same research when he advocates the ten year time scale for programmers in his article Teach Yourself Programming in Ten Years.

These ideas are hardly new. Socrates was onto a similar idea almost 2500 years ago when he said

"True knowledge exists in knowing that you know nothing."

In programming, as in almost any worthwile pursuit, this means that you have to give yourself time to reach a skill level high enough that you know how much you don't know. If you catch yourself feeling confident in a given language after only three days (or even three months), try to realize that you've probably only seen a tiny fraction of the what there is to see. Remember that 90% of an iceberg is below the surface.

Tuesday, November 18, 2008

Friday, November 7, 2008

NetBeans PMD plugin

I was installing NetBeans on a new computer recently (and prompted by a question on StackOverflow), so it came to my attention that, due to the blazing fast speed of NetBeans development, the instructions for installing the NetBeans PMD plugin are a little outdated. Here's an updated set of instructions*.

1. Download the latest PMD release from SourceForge.
2. Unzip the zip file to any directory. I put mine in C:\Program Files\Java\pmd-netbeans60-2.2.1, right next to my JDK directory. You'll see that the zip file contained a .nbm file that encapsulates the plugin.
3. Run NetBeans and click Tools -> Plugins
4. Go to the Downloaded tab on the Plugins dialog.
5. Click on the Add Plugins... button at the top of the tab.
6. Navigate to the directory where you unzipped the PMD plugin zip file.
7. Select the file named pmd.nbm.
8. Click the Open button.

You'll get a dialog saying that the plugin isn't signed, asking you whether you want to accept it anyway or cancel the installation. You'll need to accept it without a signature in order to use the plugin. You'll need to restart NetBeans after installation is complete.

You can run PMD by selecting a single source code file or an entire source package, then selecting Tools -> Run PMD (or Ctrl-Alt-P) from the main NetBeans menu.

To change from the default rule set go to Tools -> Options on the main NetBeans menu. On the Options dialog, select the Miscellaneous toolbar button, then the PMD tab. On the PMD tab you can press the Manage Rules... button to change the default ruleset, or press the Manage Rulesets... button to create a custom ruleset of your own.

*Note: I'm doing the installation for NetBeans 6.1 running on Windows XP as I write these instructions. I've verified these instructions using NetBeans 6.0.1 on Ubuntu.

Saturday, October 4, 2008

Sometimes you're still wrong...

It seems funny how sometimes you can just know you're right, and still be proven wrong. Take for example this old programming idiom.

if(condition == true) {
return true;
}
else {
return false;
}

Everyone should recognize this by the time they've completed a CS (or equivalent) degree. There are two things wrong with the above code. First, if the condition variable is boolean, then there's no need to compare it to true. Second, the whole idiom simplifies down to just

    return condition;

There, we've just saved five lines. I've corrected this so many times I've lost count. Then what's my point, you ask? When is the six line version really okay? When you're debugging. Sometimes you want to set a break point where the condition is either true or false. If you only have the return statement to set the break point on, the debugger stops every time it hits that one line. That's really annoying if the condition you're interested in only happens once in a thousand evaluations.

So what's the real lesson here? Even when you know you're right, even when you've been right for years, you can still be wrong.

Thursday, September 11, 2008

Buggy Quotes

I've been collecting quotes about programming from all over the internet. Here are a few of my favorites about software bugs.

There has never been an unexpectedly short debugging period in the history of computers.
--Steven Levy

Any sufficiently advanced bug is indistinguishable from a feature.
--Rich Kulawiec

If debugging is the process of removing software bugs, then programming must be the process of putting them in.
--Edsger Dijkstra

As soon as we started programming, we found to our surprise that it wasn't as easy to get programs right as we had thought. Debugging had to be discovered. I can remember the exact instant when I realized that a large part of my life from then on was going to be spent in finding mistakes in my own programs.
--Wilkes, Maurice

Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?
--Brian Kernighan

Beware of bugs in the above code; I have only proved it correct, not tried it.
--Donald Knuth

The major difference between a thing that might go wrong and a thing that cannot possibly go wrong is that when a thing that cannot possibly go wrong goes wrong it usually turns out to be impossible to get at and repair.
--Douglas Adams

In a software project team of 10, there are probably 3 people who produce enough defects to make them net negative producers.
--Gordon Schulmeyer


These quotes just underscore the fact that programming is error prone and debugging software is hard. Since this is the case, it always amazes me that more people aren't using unit testing as a part of their daily development cycle.

I'm not saying that every developer should religiously follow every tenet laid out in Kent Beck's Test Driven Development, just that a good set of unit tests is a nice safety net to have. It can give you the confidence you need to change your code, knowing that you can always run the tests to make sure you didn't break anything. If you're not making unit tests a part of your daily routine, then why not?

Saturday, September 6, 2008

Turn off balloon tips

I love using Outlook's built-in Tasks list to keep my daily tasks organized. By organizing all of my tasks by categories that I define, it really helps me to keep up to date on what I need to do on various projects. Unfortunately, having my "To Do" list integrated into my email application is far from ideal. I like to leave my task list minimized so I can refer to it with one mouse click, which means I also have my email client running all the time. This is less than ideal from a productivity standpoint because it means having Outlook notify me every time I get an email that I'd prefer to ignore.

Fortunately, there's a way to have my cake and eat it too. Using Tweak UI or a quick registry edit I can turn off those annoying balloon tips that Windows uses to so effectively destroy my concentration.

Tweak UI

TweakUI is a Windows PowerToy that gives you access to system settings that are not exposed in the Windows XP default user interface. Essentially, it is a GUI front end that allows you to safely edit your system's registry without resorting to regedit. You can download TweakUI from Microsoft's PowerToys site. Once installed, you can disable balloon tips by going to the TweakUI "Taskbar and Start menu" tab, then unchecking the "Enable ballon tips" box.



After you press "Apply", TweakUI will save this preference in your registry, disabling balloon tips across all Windows applications.

Edit the registry

If you're the kind of Windows power user who likes to see what's going on "under the hood", you might be more comfortable editing your own registry, rather than relying on a tool like TweakUI to do it for you.

1. Run regedit (Start -> Run, then type in "regedit").
2. Navigate to the HKEY_CURRENT_USER | Software | Microsoft | Windows | CurrentVersion | Explorer | Advanced registry node.
3. Create (on the Edit menu, select New -> DWORD Value) or edit the DWORD value named EnableBalloonTips, setting it to a value of 0.
4. Log off and log back in again for the new setting to take effect.

Congratulations! Now you can run Outlook all the time without being constantly interrupted by alerts generated by incoming emails.

Tuesday, August 26, 2008

Trim Tokens

When working with a tokenized string, like those found in comma-separated value (CSV) files, it's common to encounter this problem. What do you do when someone edits the file by hand, and inserts extra space characters to make the file more human-readable? Add a space to the token? That's no good because then all of your files have to match the "new and improved" format.

For example, take the following snippet of CSV file.

username, password,home folder,default editor, favorite color,web address

Notice that the delimiter is inconsistent in this example. Sometimes I have tokens separated with a single comma (",") and sometimes with a comma followed by a space (", "). If the delimiter were consistent it would be a simple matter to get all tokens with the following code:

String[] tokens = line.split(",");

First of all, let's note how much simpler this is than the old method of using a StringTokenizer to loop through the text scanning for more tokens. String's split method added in the Java 1.4 release is a great improvement. The second thing you should note is that the split method accepts one argument, and that argument is a regular expression. This should be a clue to solving our CSV formatting problem. How do we specify that the delimiter in our file is a comma that might sometimes be followed by a space? By harnessing the power of regular expressions. (Ok, that's overstating the case by quite a bit. We're really only harnessing a tiny fraction of the power of regular expressions.)

String[] tokens = line.split(",\\s*");

That says split the line on any comma followed by zero or more spaces (check out Mastering Regular Expressions for an in-depth guide to regular expression syntax). This single line of code should have the desired effect of splitting the line into the following array.

username
password
home folder
default editor
favorite color
web address

Try it out by creating a simple class that reads a single line of comma-delimited text and prints out the tokens. If you really want to see what an improvement String's split method brings, try writing the same function using a StringTokenizer instead.

Monday, August 18, 2008

On "Quantity Always Trumps Quality"

Jeff Atwood of Coding Horror fame recently posted an article titled Quantity Always Trumps Quality, the main point of which is that endless designing and theorizing is a waste of time, and that that time could be better spent building something. The idea is that the time you spend failing early on is time spent learning to do whatever it is you're attempting to do the right way. That's a benefit you don't get by spending time designing the wrong thing. While this idea isn't new or original it is still excellent advice, and Jeff shows how it applies to a wide range of disciplines, from making clay pots to software development and blog writing.

One reason that I like this advice so much is that it teaches us not to be afraid of making mistakes. Not only that, we can learn to embrace our mistakes because that is, after all, how we learn. I've often felt that if something is stagnant (a software project, your company's sales, a personal relationship), doing any random thing to get out of stagnation can be better than doing nothing at all. Once you make a random change you can analyze the results and see what you need to correct. If you go into the change with the mindset that it's only purpose is to teach you what you really should have done, it makes it a lot easier to bear when you make a mistake.

Saturday, August 9, 2008

Stop a thread that's accepting connections

A common approach to implementing a server is to set up its main loop to accept connections from clients, then create separate threads to handle each connection.

But how do you shut down a thread that's blocked in the wait state? It's not enough to simply set the running flag to false, breaking it out of its main loop, as that would leave the server waiting for one last connection before it lets go of its resources and shuts down. The following code shows you how.

public class SimpleServer extends Thread {

private volatile boolean running = false;
ServerSocket serverSock = null;
private int serverPort = 8080;

public SimpleServer (int serverPort) {
this.serverPort = serverPort;
}

/** Begin listening for connections from subscribers. */
public void run() {
System.out.println( "Simple Server started." );

try {
serverSock = new ServerSocket( serverPort );
} catch ( IOException e ) {
System.err.println( e.getMessage() );
}

running = true;
// main loop of the thread
// listens for new subscribers
while( running ) {
Socket sock = null;
try {
// blocks while waiting
sock = serverSock.accept();

// handle the new socket connection
}
catch ( IOException ioe ) {
System.err.println( ioe.getMessage() );
}
finally {
try {
sock.close();
}
catch ( IOException ioe ) {
System.err.println( ioe.getMessage() );
}
}
}
}

/** Check the running flag. */
public boolean isRunning() {
return running;
}

/** Stop the server listen thread. */
public void shutDown() {
running = false;
try {
// open a connection to the server socket,
// because it is blocking on accept
new Socket( serverSock.getInetAddress(),
serverSock.getLocalPort() ).close();
System.out.println("Server shut down normally.");
}
catch(IOException ioe) {
System.err.println( ioe.getMessage() );
}
}
}

The interesting technique here can be found in the shutDown method. After setting the running flag to false we open one more socket connection on the listening port, then immediately close it. This allows the server application to terminate immediately and gracefully.

Thursday, August 7, 2008

Avoid NPE when comparing strings

By now everyone knows that you should use the equals method when comparing two objects in Java (as opposed to using the == operator). Strings are no different. But what happens when you compare a null reference to a string literal, like so:

String myRef = null;
if ( myRef.equals("literal") ) {
// do something
}

If you run this code you'll see you get a Null Pointer Exception for calling a method on a null reference. I've seen some developers get around this problem by checking for null before calling equals.

String myRef = null;
if ( myRef != null && myRef.equals("literal") ) {
// do something
}

This cleverly uses the short-circuit property of the && operator. If the first part of the condition is false the second part won't be evaluated, avoiding the NPE. Like most clever code, this is not the best way to go about it. I learned a better way when reviewing a colleague's code. Just take the original condition and switch it around so you call equals on the literal, and pass the (possibly null) reference as a parameter.

String myRef = null;
if ( "literal".equals(myRef) ) {
// do something
}

This avoids the NPE because a string literal can never be null. If the reference is null, the condition returns false, as expected. This solution is also a little bit shorter and clearer than the first one. The lesson to be learned here is that if you have to resort to "clever" tricks to get something done, with just a little bit of lateral thinking you can probably find a cleaner, simpler way.

Sunday, August 3, 2008

Implementing equals in Java

Update: I made reference to Josh Bloch's Effective Java in this article. Those references are to the first edition of the book and should be disregarded. The second edition addresses every issue that I (and others) considered a shortcoming concerning the implementation of the equals method from the original edition. If you are a Java programmer, you should probably stop reading this and just read the second edition of Josh's book instead.

Second update: There's also an excellent and detailed article online on Jave equality by Martin Odersky, Lex Spoon, and Bill Venners: How to Write an Equality Method in Java.

Many beginning Java programmers get confused about the difference between the equals operator (==) and the equals method. This confusion can probably be traced back to the fact that ==, like so many things in Java, works differently with primitives than it does with object references. The == operator in Java works with primitives by comparing their values, but it works with objects by comparing their references. This means that == will always work with primitives, but it will rarely work as expected with objects. Remember that references in Java point to a location on the memory heap. Two references that point to the same object are equal, but two references that point to two different objects are never equal, even if they hold the same value.

Another source of confusion may be from the default implementation of the equals method itself. Every class in Java inherits a default equals implementation from the Object class. Object's equals method simply compares references, so the default implementation gives the exact same behavior as the == operator. That's why it's a good idea to override the equals method in any classes you create that need more than the default behavior.

Unfortunately, implementing equals is harder than it at first may seem. Many beginning developers (and even a few experienced ones) naively cast to the correct type and compare significant fields without giving due consideration to the general equals contract. Joshua Bloch spends ten worthwhile pages on the subject in his book Effective Java, and some would argue that he still doesn't get it exactly right. In addition to the properties laid out in the general equals contract, there are also a few object design decisions that need to be taken into account to insure correct equals behavior. I'll try to explain all of these issues.

By general contract, the equals() method in Java must be reflexive, symmetric, transitive, consistent, and any non-null reference must return false. In other words, for arbitrary values of a, b, and c, the following tests must always pass:

// reflexive property
assertTrue( a.equals(a) );

// symmetric property
assertTrue( a.equals(b) == b.equals(a) );

// transitive property
if ( a.equals(b) && b.equals(c) ) {
assertTrue( a.equals(c) );
}

// consistency property
assertTrue( a.equals(b) == a.equals(b) );

// non-null property
assertFalse( a.equals(null) );

It would be difficult to get reflexivity and consistency wrong without trying. A very simple test for null is all that's needed to make sure your equals method returns false when a null parameter is passed. The symmetry and transitivity properties, on the other hand, are difficult to get right without giving some serious thought to each.

For example, both Java Practices and Joshua Bloch recommend the following steps for implementing equals:

1. Use this == that to check reference equality
2. Use instanceof to test for correct argument type
3. Cast the argument to the correct type
4. Compare significant fields for equality

Given this template, the equals method for a simple 2-dimension Point class might look like the following:

class Point {
private int x;
private int y;

...

public boolean equals(Object obj) {
// check for reference equality
if(this == obj) return true;

// type check
if( !(obj instanceof Point) ) return false;

// cast to correct type
Point p = (Point)obj;

// compare significant fields
return (this.x == p.x && this.y == p.y);
}
}

Step one fulfills the reflexivity requirement of the equals contract. It also serves as an optimizing step. If the parameter refers to this object, obviously they are equal and you don't need to waste any more time comparing them. This saves time particularly when the comparison (step four) is costly.

Step two uses instanceof to check that the argument is of an acceptable type. This is usually the same type as this object, but could be a super type or interface. Consider comparing two Collections to each other. You may wish to consider two Lists equal if they have the same elements, regardless of whether one is a LinkedList and the other an ArrayList. Simply implementing the List interface should be enough to correctly pass this step, then safely cast to the same type in step three. Notice also that this method doesn't explicitly check to see if the argument is null. That's because there's an implicit check in step two. The instanceof operator will return false if its second argument is null.

It's not hard to prove that this implementation of equals satisfies the symmetric and transitive properties of the equals contract. A few simple test cases should show that for any two Point objects, if x and y are equal, the equals method will return equal, and if x or y are not equal it will return false. But there is a very subtle bug lurking in this implementation. You have to ask yourself, "What happens if I extend this class by adding a significant field?" By "significant field", I mean a field that impacts equality. One that should be taken in to account by the equals method.

Let's look at an example. Say I want to extend this class to make a 3-dimensional Point by adding a z dimension. In this case the Point3D equals method needs only to call the Point equals method and add its own comparison of the z-dimension instance variable.

class Point3D extends Point {
private int z;

...

public boolean equals(Object obj) {
if(!(obj instanceof Point3D)) return false;
Point3D p3d = (Point3D)obj;
return super.equals(obj) && this.z == p3d.z;
}
}

This implementation works for two Point3D objects the same way the Point equals method works for two Point objects. The problem is that it breaks the symmetric property of the equals contract when you mix objects of the two types. Consider the following:

Point p = new Point(4, 2);
Point3D p3d = new Point(4, 2, 3);
p.equals(p3d); // returns true
p3d.equals(p); // return false

The problem is that Point3D instanceof Point returns true, while Point instanceof Point3D returns false, violating the symmetric property.

To his credit, Joshua Bloch goes on in his book to explain a technique using composition that offers a solid workaround to the problems inherent in using instanceof. There is another popular approach to implementing equals that also circumvents these problems.

1. Test for a null argument
2. Use if (getClass() != obj.getClass()) to test for correct argument type
3. Cast the argument to the correct type
4. Compare significant fields for equality

Using this slightly different template the equals method for our simple Point class now becomes:

public boolean equals(Object obj) {
// check for null reference
if(this == null) return false;

// type check
if(this.getClass() != obj.getClass()) return false;

// cast to correct type
Point p = (Point)obj;

// compare significant fields
return (this.x == p.x && this.y == p.y);
}

As you can see, the two approaches only differ in the first two steps. First, we replace the optimizing step with a test for null. The null test is required because in the second step we're no longer using instanceof, it's been replaced with a call to getClass. This fixes the problem with symmetry that we experienced in the first approach because the getClass method will always return false if the parameter is not the exact same type as the object class. Super class or subclass parameters return false, preserving the symmetric property. Be aware that this can lead to surprising behavior when two objects are equal in all significant fields, but are considered unequal because they aren't the same type (refer back to the LinkedList, ArrayList example I gave earlier).

So which of these two approaches to implementing equals should be preferred? The questions to ask when designing your class is whether you're designing it to be extended or not, and whether it is important for classes of different types (but the same interface) to adhere strictly to the equals contract. In most cases strict adherence to the equals contract should be preferred, and so the getClass approach should be used. The only times you should consider using instanceof in your equals methods are when objects of a subclass must be comparable to objects of their base class type, or if two objects that implement a common interface must be considered equal based on their state rather than just their type. In both these cases, be well aware of what properties of the general equals contract you may be violating. It's also a good idea to thoroughly document these points in your code and in your API documentation.

Tuesday, July 29, 2008

Rotate an image in Java


In Python, many complex image functions are made simple using the Python Imaging Library (PIL). For example, I can load an image from file, rotate it any number of degrees, and display the image in just four lines of Python code.


from PIL import Image
pic = Image.open("wire.png")
pic.rotate(45)
pic.show()

Performing the same task in Java, however, requires the involvement of several classes and a slightly deeper understanding of graphics processing. I feel that Sun's decision to break everything up into hundreds of classes offers great flexibility for me to combine classes to come up with my own solution to a given problem. This approach may be verbose, but it does lead to a better understanding of the underlying algorithms involved.


import java.applet.Applet;
import java.awt.Graphics;
import java.awt.Graphics2D;
import java.awt.Image;
import java.awt.geom.AffineTransform;
import java.net.URL;

public class RotateImage extends Applet {

private Image image;

AffineTransform identity = new AffineTransform();

private URL getURL(String filename) {
URL url = null;
try {
url = this.getClass().getResource(filename);
}
catch(Exception e){}
return url;
}

public void init() {
image = getImage(getCodeBase(), "image.jpg");
}

public void paint(Graphics g) {
Graphics2D g2d = (Graphics2D)g;
AffineTransform trans = new AffineTransform();
trans.setTransform(identity);
trans.rotate( Math.toRadians(45) );
g2d.drawImage(image, trans, this);
}
}

This looks a lot worse than it is. There are 22 lines of code here (not counting white space and curly braces), but the majority of them are spent on importing Java libraries and loading the image. Only six of them are spent rotating and displaying the image.

The most complicated part of the code is the AffineTransform object. According to Sun's AffineTransform API, "The AffineTransform class represents a 2D affine transform that performs a linear mapping from 2D coordinates to other 2D coordinates that preserves the "straightness" and "parallelness" of lines." If you experiment a little with this class (or just continue reading the API), you'll see that it can be used not just to rotate, but also to scale, flip, and shear an image as well.

Monday, July 21, 2008

Hello, World!

This is a programming blog written by Bill Cruise, a professional computer programmer. I've started this blog to publish a number of essays I've written covering a range of topics in Computer Science, Programming, Software Engineering, Design, and Mathematics. I've always felt that I don't really understand a topic unless I can explain it to someone else on a whiteboard. This blog is my whiteboard. I write the essays in an attempt to organize my thoughts and better understand a given topic. Publishing them is an invitation to others to share their thoughts and join in a discussion.

Hope you enjoy it!


Why "Hello, World!"?

A "Hello, World!" program is a traditional first program written by software developers when learning a new language. It's purpose isn't so much to learn anything about the syntax of the language, but more to test that the programming environment is set up correctly. It's simple enough that syntax mistakes can be easily avoided, thus making it a good choice for a first program.

The tradition of using some variation of the phrase "Hello, World!" seems to have its origin in an example program first published in The C Programming Language, written by Kernighan and Ritchie (K&R).
main() {
printf("hello, world");
}

Further Reading
List of "Hello, World!" Programs