Monday, June 8, 2009

A Java Puzzler

A co-worker presented me with the following Java puzzle today (thanks, Josh). Given the following two classes:

public class Base
Base() {

void preProcess() {}

public class Derived extends Base
public String whenAmISet = "set when declared";

@Override void preProcess()
whenAmISet = "set in preProcess()";

what do you think the value of whenAmISet will be when a new Derived object is created?

public class Main
public static void main(String[] args)
Derived d = new Derived();
System.out.println( d.whenAmISet );

Give yourself a few minutes to look it over before reading ahead. It's a really simple example to follow, so no fair compiling and running it to find out. Think you know the answer?

It appeared to most people that the output should be "set in preProcess()". The reasoning went that when the Derived constructor is called, it implicitly calls the Base constructor via a call to super() that the constructor inserts automatically for you. The Base constructor then makes a call to the most specific copy of preProcess() that it can find, the one in the partially constructed Derived class, which sets the value of the whenAmISet member variable.

This, of course, is wrong. If you compiled and ran the code above you found that it prints out "set when declared". So what happened there? Is the Base class preProcess() method getting called? Nope. (Go ahead and put a print statement in each preProcess() method if you don't believe me.)

Here's the sequence of events:
  1. The Derived constructor is called.
  2. Memory for Derived member variables is allocated.
  3. The Base constructor is called implicitly.
  4. The Base constructor calls preProcess().
  5. preProcess sets whenAmISet value to "set in preProcess()".
  6. Derived class member initializers are called.
  7. The body of the Derived class constructor is called.
Wait, what just happened? At step 6, the Derived class members are initialized after the call to the preProcess() method? That's right. You can't count on the declaration and initialization of member variables being atomic. As you can see from running the example, the initialization of whenAmISet in Declared is over-writing the value that gets set earlier in the preProcess() method.

How do we know this is the right sequence? Let's take a closer look at the significant steps.
  1. We call the constructor.
  2. We know the memory for Derived class member variables is allocated before any line of code is run in its constructor. If this weren't the case then the call to preProcess() in the Base class wouldn't be able to set the value of whenAmISet and print it's value (which isn't in the code above, but you can try it and see that this works).
  3. Java inserts a call to super() at the beginning of the constructor unless you include it yourself.
  4. We know this by looking at the code.
  5. Also known by code inspection.
  6. The initialization must be happening after the call to preProcess(). The Derived class's preProcess() method is getting called, and it is setting whenAmISet's value to "set in preProcess()". Again, if you don't believe me, print out the value of whenAmISet right after it gets set in the Derived preProcess() method.
  7. We're calling the default (empty) constructor, so there's nothing more to see here.
You can see the relevant section of the Java Language Specification for a much more detailed explanation of the process of object creation in Java. (Thanks to Weeble for providing this link in the comments.) Note that there's a very similar example to this one on that page if you scroll down to the paragraph starting "Unlike C++..."

So what can we take away from all of this? First, never count on any operation being atomic unless you've tested and proven it to be so (and sometimes not even then). Second, and more to the point, be careful initializing member variables in a method of a derived class if that method gets called by the base class constructor. It might be even better to try and avoid calling methods in your base constructors that are overridden by derived classes, and conversely, avoid overriding methods in derived classes that are called by the constructor in the base class. Be careful if you do, and don't say I didn't warn you about what might happen.

Further Reading

I could hardly finish without mentioning that Joshua Bloch and Neal Gafter wrote an entire book full of similar puzzles called Java Puzzlers: Traps, Pitfalls, and Corner Cases. Check it out if you'd like being your team's source for Java trivia and minutae.


nicolaslara said...

One thing I dislike about Java is that puzzlers like this one (that defy intuition) seem to appear more often than not. Are they corner cases or is it just poor design? Personally, I'm not sure what to think (and don't have the Java-fu to make that decision and much less to make such a claim)
Great post, however! Keep on the good work =)

Bill the Lizard said...

Good point. Many languages have these corner cases that you have to watch out for, but that doesn't exonerate Java for having so many of them. C++ and Perl are two other languages whose syntax make it notoriously easy to shoot yourself in the foot. I know a lot of people are moving towards languages like Ruby and Lisp, at least in part because the simple syntax of those languages cut down on the number of "gotchas" you have to look out for. There is a benefit to having only one right way to do something. Do tools like PMD and Checkstyle even exist for Lisp?

When discussing this particular puzzler with my co-workers, we decided that it did seem like a rather questionable choice to have the initialization after any constructor code had run.

Thanks for commenting, and for reading.

Weeble said...

I don't agree with this description of the first step of the construction process: "The Derived constructor is called (member variables are declared, methods defined)."

Declaration and definition are compile-time concepts, not run-time concepts. Nothing at all is done with methods at instance construction (the instance doesn't acquire new instances of each method, for example) and what happens with member variables is that *before* the constructor is invoked memory is reserved for them and set to their default values. See the specification here:

"Whenever a new class instance is created, memory space is allocated for it with room for all the instance variables [...] All the instance variables in the new object, including those declared in superclasses, are initialized to their default values (§4.12.5)."

There follows a section describing how the constructors are invoked and that explicitly states that initializers are run after base class constructors are invoked.

Bill the Lizard said...

Thanks for catching that. Of course the methods are a part of the class, so they aren't really defined at run time, but my point was that they already exist before any code in the constructor is executed. I was trying to make that a little bit simpler conceptually, but as you pointed out, I failed at that. The method just existing as a part of the class definition is simpler, so I took it out of my description. I corrected the mistake and tried to make it more descriptive of what actually happens at run time.

Thanks also for pointing out the relevant part of the Java specification that deals with this puzzle. I had looked for it and gave up when I didn't find it in chapter 8 on Classes.

Weeble said...

Sorry, I realise now I sounded a bit gruff in my response. I was just in a bit of a hurry.

It is a confusing little corner of the language. I think it behaves in this strange way because Java lets you access other instance fields in a field initializer, and thus it has to make sure the base class constructor has been executed before the initializers can be evaluated. C# behaves more intuitively (at least to me), but doesn't allow you to access instance fields/methods from a field initializer.

I should point out I'm no Java expert. I mostly do C#. I might be wrong.

Bill the Lizard said...

No need to apologize, you didn't sound gruff to me at all, plus you made this post better. I like it when people point out my mistakes because that's how I learn new things.

Good point about accessing other instance fields in initializers. I hadn't realized that C# behaved differently.

Thank you, I really appreciate your comments.

Aaron said...

re "bad design", I'd characterize it as _different_ design. C++ chooses the route that base classes construct entirely before derived classes -- that the object isn't even a "derived" instance until the base ctor completes. as a consequence, calling 'preProcess' from the base cannot invoke derived::preProcess, because it "doesn't exist yet". Java follows the simpler implementation route, probably as a natural consequence of how its runtime typing works using tagging instead of vtables -- tags aren't allowed to change, so to get the C++ behavior they'd need to add another integer field somewhere, perhaps to each object instance, to track where in the construction process they are. That would be a waste for such an edge case. So (i assume) they took a calculated risk and changed how construction and virtual dispatch take place.

Of course, the _real_ question is, given all this, why don't they have some sort of warning when a constructor calls a virtual method? Here, the virtual default comes back to bite you. If there really is an override somewhere in the inheritance tree, calling it from the constructor is probably broken.

(I know it's idiomatic in Java to use virtual methods to allow a derived class to customize base constructor behavior -- of course, constructor parameters can do that too. Perhaps we consider using those more often, and mark functions used by constructors as sealed. I think everyone knows how function argument passing is supposed to work by now)

Bill the Lizard said...

I'm glad you pointed out this not-so-subtle difference between C++ and Java. Every method in Java is virtual unless you specify otherwise (by explicitly declaring it final). This is a convenience most of the time, but as you said, it bites you here.

This comment also reminded me that this was discussed briefly in Josh Bloch's Effective Java. In the chapter titled "Design and document for inheritance or else prohibit it" (chapter 15 in the 1st ed., the only one I have with me), Bloch states that "Constructors must not invoke overridable methods, directly or indirectly." He then goes on to explain that unexpected behavior may result.

This does seem like it calls for at least a warning from the compiler.

William Shields said...

The sequence of events isn't quite right. I just created a version of this program that used a custom class I made instead of String for the data member so I could print a line when it was created. The sequence is:

1. Base constructor
2. Derived preProcess (NOT the Base preProcess). This is confirmed by both a print statement in the preProcess() method and the data member's class;
3. Base constructor end;
4. Derived's data member initializer, which overwrites the previously set value.

Try it.

Bill the Lizard said...

I said that the Base constructor calls preProcess(), not that it calls its own preProcess(). The one that gets exectuted is the one in the Derived class.