Tor Norbye's Blog

Monday, September 12, 2005

Rejected Ads

WARNING: This blog entry was imported from my old blog on blogs.sun.com (which used different blogging software), so formatting and links may not be correct.

Check out

these ads eventually rejected by "top business publications". clientjava.com claims it was Wall Street Journal -
I have no inside information.

(There are more ads at the marketing site)

Sunday, September 11, 2005

Code Advice #3: No Tabs! Ever!

WARNING: This blog entry was imported from my old blog on blogs.sun.com (which used different blogging software), so formatting and links may not be correct.

(See intro for a background and caveats on these coding advice blog entries.)

Should you use spaces or tab characters when indenting your code?
This question has been debated at length in the past, with a fervor similar to the "emacs versus vi" editor debate.
But unlike "emacs versus vi", we cannot just agree to disagree. We can each choose to use a different IDE.
But the source code is often shared, and if there's one thing that's worse than a source file indented with tabs, it's a source
file partially indented with tabs and spaces. This is typically the result of a file edited by multiple users.

My advice is simple: Always use spaces to indent. That doesn't mean you can't use the Tab key in your keyboard to indent - most tools will automatically do the right thing with spaces instead. In other words, the Tab key is the Indent key, not the Tab character key.

So why is it bad to use tabs instead of spaces?

There are several reasons. Obviously, there's the reason I started out with: that we really need to pick one convention. Spaces for indentation is the most common scheme used to today, so it's a reasonable choice on that basis alone.

One of the problems with tabs is that a tab character needs to be converted into whitespace by the editor when displaying the file. How much whitespace should each tab character be replaced with? In an ideal world, the old typewriter functionality could be used, where each tabstop had a certain pixel position. That way people could even use proportional width fonts in their editors (instead of the blocky monospace fonts used by practically all code editors today), and the code would still indent nicely. However, no editor that I'm aware of supports this, so that's not a practical venue. Instead, editors typically make an assumption that a tab is either 8 characters (common in ye old days) or 4 characters (common in Java editors today). Some editors will stick with the 8 character assumption, but support 4-character indents in Java (which is common), so when indenting to level 3, they will insert a tab, followed by 4 characters, to get a 12 character indent using an 8-character tab.

Why is this bad? Because code is viewed in more than one tool. In the presence of tabs, code often gets misaligned. Code integration e-mail diffs, code viewed in other editors, code edited by other tools which treats tabs in a different way will easily get "mangled" (e.g. start getting mixed spaces and tabs).

(Sidenote: In the old days, source files sometimes included a comment at the top of the file, with special "tokens" (-*-) intended for Emacs. These tokens would identify the language mode as well as the intended tab size for the file. When loading the file, emacs would use the specified tab size. Thus, the source files actually carried the tab information needed to edit the file as intended. However, this solution doesn't really solve the problem since all other tools which process and display the file would also need to be aware of this metadata.)

I've heard people put forward two arguments in favor of using the tab character:

If a file uses ONLY tab characters for indentation, it is easy for users to read code at their own favorite indentation level.
In other words, I can read your source file with tabs=4, you can read it with tabs=2

It's easier to move back and forth indentation levels, since a single left/right keystroke will jump across tab characters, e.g.
whole indentation levels.

Regarding argument 1: There are lots of other things I want to customize when I read other people's code too. You see, people don't all agree with my code rules that I'm putting forth in these blog entries :-) So if I read code that is indented poorly, or worse yet put spaces between function calls and the parenthesis, or other horrible coding sins, I hit Shift-F10 to reformat the source properly first anyway. This solution is more comprehensive than simply adjusting the indentation depth.

Regarding argument 2: I don't see a big usecase for being able to move the caret up and down indentation levels. These only apply at the beginning of the code line, and the Home key should alternate between jumping to the beginning of the line and the first nonspace character on the line. Why would you ever need to go somewhere else? Perhaps you want to move some code up an indentation level. That's what the Reformat feature is for. Just reformat the buffer instead.

(Minor sidenote: In
Emacs, and I believe in JBuilder, the Tab key was bound to a reindent action, NOT inserting indentation. This is a much better use of the Tab key. When you're on a new line, pressing Tab should move the tab to the correct indentation level (reindent), NOT inserting say 4 characters. If you're on a line with existing code, hitting Tab should NOT insert 4 characters where the caret is located, it should adjust the line indentation such that it's correctly indented. Thus, if I put an if block around a piece of code, I can just hit Tab, Arrow Down a couple of times to indent the block correctly. I submitted a

patch for NetBeans to do this a while ago but this behavior is apparently a bit controversial. For a previous XEmacs user like myself it's indispensable.)

Therefore, in my opinion, these potential advantages do not make up for the massive problems and ugly code that result.
Let's all use the same convention - no tabs.

All IDEs let you do this. (I even believe most IDEs default to using spaces. Please double check in your own.)
Here's the option in the new NetBeans 5.0 options dialog:

The people who seem to rely the most on Tabs today are people using old-style editors where Tab characters are still the default.
If you're using Emacs, add the following to your .emacs file:


(custom-set-variables
'(indent-tabs-mode nil)
'(tab-width 4))

Here's how you do the same thing in Vim.

Friday, September 9, 2005

NerdTV

WARNING: This blog entry was imported from my old blog on blogs.sun.com (which used different blogging software), so formatting and links may not be correct.

PBS is making available online NerdTV: a series of hour-long programs where Cringely interviews various technology luminaries. The format is a lot like the Charlie Rose show. I just "watched" the first program this morning while sifting through my e-mail. It's with Andy Hertzfeld, of Apple++ fame. Episode #3 will be with Bill Joy; I'm especially looking forward to that one.

So I'm definitely bookmarking
the URL.

Thursday, September 8, 2005

First Day of School II

WARNING: This blog entry was imported from my old blog on blogs.sun.com (which used different blogging software), so formatting and links may not be correct.

It's been a very hectic week in my personal life. This week school started - my daughter in first grade, my youngest son in preschool, and my oldest son in kindergarten. He's in the picture on the right outside his new classroom - the same room his sister had last year.

It was a nightmare balancing the equations such that they're all taken care of from 8 until 5 every day. School only runs until 2:35 (first grade) and 11:20 (kindergarten) so I had to find an after-school program right next to the school - and the school-provided one isn't available for kindergarteners. And of course the after school programs won't take preschoolers.

Filling out all the paperwork was a Dostoyevskian effort. For each child there was form upon form, all asking the same information over and over again. Whatever happened to the paperless office? Reducing red-tape? There's definitely an opportunity here!

(First Day Of School I)

Tuesday, September 6, 2005

Code Advice #2: No Vanity Tokens

WARNING: This blog entry was imported from my old blog on blogs.sun.com (which used different blogging software), so formatting and links may not be correct.

(See intro for a background and caveats on these coding advice blog entries.)

Some programmers like to tag their code modifications with their
own initials:


//!FB Shouldn't we compare with MIME type?

I've been told the format shown here, using the explamation
point in front of the initials, is/was the preferred format
at Borland. They even had a tool to go and strip these
types of comments out before releasing source code to customers.

Having people "sign" their comments doesn't contribute to
make the comment more readable. Real projects should be using
some kind of version control system, and information about
which line was written by whom (as well as when)
belongs in the version control system, not inline in the source.
The comment should be able to stand on its own feet.

There are some standard tokens that should be used.
TODO, FIXME, and XXX are
standard tokens used to mark code that needs to be revisited.
They are "standard" in the sense that many editors and IDEs automatically
highlight and search for these tokens. In fact, they are even
documented in section 10.5.4
of the standard Java code style document:

Use XXX in a comment to flag something that is bogus but works.
Use FIXME to flag something that is bogus and broken.

They left out TODO, which should be used where something hasn't
been implemented in the first place.

By the way, @todo is a reserved javadoc tag and
means the same as TODO in comments. (Speaking of javadoc tags,
@author is obviously okay and does not fall under the no-vanity-tokens
guideline.)

In addition to these, I use some additional tokens to mark code
for other purposes:

JDK15 is a token I attach next to code or comments
which should be revisited when we drop JDK 1.4 support and can
start using JDK 5.0 APIs. For example, I have code in the designer
which tracks the current mouse position (such that pressing Ctrl-V
to paste a component from the clipboard places it under the mouse
position). In JDK 5.0 there is a new API which lets me look up
the mouse pointer position directly (yay!! Thanks!) so I can rip
out the old-style code.

NOI18N is a code marker which indicates that any String
literals on the same code line are deliberately not internationalized.
This is used for Strings which not be localized, such as the name
of a Java class used in Reflection-related code for example.

I'm sure there are other useful tokens too. What makes these different
from "author signatures" in comments is that they are added specifically
to facilitate locating this code fragment in the future (when looking
for JDK 5.0 migration possibilities for example) or for processing
by tools (such as the String localization checker).

Including bug ids in comments falls in roughly the same category.
Code segments full of bug database numbers are hard to read. It's much better
if the various corner cases that were exposed by bugs are explained
inline in the code. This is less of an absolute however in cases
where there truly is a lot of value for a developer to be able to
look up associated bug reports and history.
And of course, code which was added as a specific workaround for
a bug elsewhere (such as a OS or JDK bug) should have the associated
bug id included, such that future developers can look up the status of
the bug and easily determine if the code is still necessary.

Saturday, September 3, 2005

Code Advice #1: Don't Log, Debug!

WARNING: This blog entry was imported from my old blog on blogs.sun.com (which used different blogging software), so formatting and links may not be correct.

(See intro for a background and caveats on these coding advice blog entries.)

Rule number one: Don't Log, Debug!

One of my pet peeves in code is gratuitous logging.
Gratuitous logging is where every little thing happening in your code is also sent as a log message.
In other words, logging is used to record program events, rather than program errors!

The rationale for this is usually that logging is inserted to help find bugs later. "Just turn on
logging and comb through the output to trace execution and discover the problem."

I don't buy the above argument, but before I go into that, let me state why I think logging is bad:
It reduces code readability.
You've probably heard that you shouldn't sprinkle your code with comments that are obvious (and therefore
redundant). Sprinkling your code with logging calls describing what's going on is essentially the same
thing, only it's even harder to mentally filter these out while reading the code, since they don't have the
visual "I can ignore this" property that comments have in syntax highlighting.

There's a much better way to find bugs than by using logging: Use a debugger! I will admit that I used
to rely on logging myself, since in the early days restarting the application, and even single stepping
the debugger, was frustratingly slow.
But with the introduction of
Hotswap,
and general improvement in features and speed of Java debuggers,
there's no good excuse for not using a debugger to track down program bugs today.

The argument that's usually thrown out in defense of logging is that you need it to track down
bugs in a deployed environment, for example at a customer's site, where you don't have physical access
and cannot debug it yourself. You want to enable logging and use the log from the customer to figure out
what's going on.

In my experience, that sounds good in theory but never works in practice. The logging never has all the
information you need. To track down a specific bug you typically need fine grained
information that you hadn't thought you needed when you wrote the code.

Let me also point out that this is a really rare scenario. Usually, if a customer reports a bug, they
can tell you how to reproduce it, and once you can reproduce it locally you don't need to debug it
via logs on their system - use a debugger locally.

The scenario has come up once for me. In Creator 1.0, some customers reported that after using
the IDE for a while, the IDE would suddenly stop updating the source files even though components were
added and properties changed. There were no direct instructions to reproduce, and it was rare - nobody
internally had ever seen it. Finally a case for customer site logging! What I did was add a number
of targeted logging calls; anything having to do with buffer manipulation was heavily "instrumented".
I also developed a couple of hypotheses about what the problem might be, and added special code for
checking these hypotheses, adding log messages if they succeeded. I then provided a custom jar to
two customers who had reported seeing this problem regularly - and a few days later I had confirmation
of one hypothesis.

The key point I want to make here is that even though this actually is a scenario which called for
logging, it didn't have to be put in the product a priori. The logging calls were only added temporarily
in one of my source trees. And furthermore, to get useful logging data I added a lot more tracking
than would have been feasible to do in a product version.

So when should you use logging? To record errors and unexpected conditions - especially non-fatal ones
where you don't want to get the user's attention, you simply want to know about it yourself such that
you can check your assumptions and improve your program quality later.

If you're not convinced and still want to use logging, here's a tip for how to do it efficiently.
There's always the option of statically compiling out all logging:


private static final boolean LOGGING_ENABLED = false;
...

if (LOGGING_ENABLED) {
log("foo = " + bar);
}

By using a final boolean here which is false, the compiler knows that everything, including the if block, can
never be executed, so it will be completely eliminated - the class file will not contain any of the byte code;
the class String table will not include the string in the log call, etc.
This is zero-overhead logging, when logging is disabled.
The disadvantage of this approach of course is that it cannot be enabled by the customer, so
it's only useful internally. And for that of course, use a debugger!

A modification to this scheme which works well, is to use the assertion facility.
This is what I used before I finally removed all non-error-related logging from my code.
Create a log function like this:


public boolean log(String message) {
// log calls here - perhaps delegate to a logging library

return true;
}

Now you can use logging as follows in your code:


assert log("foo = " + bar);

This is using assertions for a side effect: the assertion will always be true
(since the logging call unconditionally returns true) so no AssertionError will be thrown.
However, the logging method parameters get evaluated and the logging method called.

This solution gets the best of both worlds: When turned off it has nearly zero
overhead (because the class loader will throw away the assertion statements when
loading the class), but it can still be turned on in those rare scenarios where
you need a customer to provide you with logging calls.
(It does however increase the size of your String pools etc. so there is a small
cost.)

But again,

In practice this scenario comes up very rarely

Even when it does your existing logging calls probably aren't sufficient anyway

You can get the data you need by providing the customer with a special enhanced logging build

Logging calls makes your code less readable

Logging calls makes your source files, and class files, larger

Without any significant pros, and a significant con, the choice should be easy!

Don't forget - logging is appropriate for logging errors! Don't start writing
empty catch blocks and blaming me! For example, in an application built on top of
the NetBeans RCP (see a
tutorial of the new
RCP support), a catch block that doesn't do anything else should at least call


} catch (WhateverException ex) {
ErrorManager.getDefault().notify(ex);
}

Code Advice "Column"

WARNING: This blog entry was imported from my old blog on blogs.sun.com (which used different blogging software), so formatting and links may not be correct.

I've discussed coding style a couple of times before in this blog, and it invariably generates a lot of interest and discussion.
It seems I'm not the only one who cares a lot about these issues.

I recently tried to explain the coding style I've used in the Creator designer source base, to somebody else on the team.
When I tried to provide references I came up short.
There are several coding style guidelines - and I tend to follow
the JDK one - but the problem with the official coding
style document is that it leaves out a lot! For example, it does not address the Tabs versus Spaces issue (cowards!).

I found some other good ones, but I disagree vehemently
with some of their rules (like open braces on a separate line,
and underscore suffixes and fields, and indentation level 2).

So, I intend to start blogging my opinions on how to write good code. Some will be controversial - especially my first "rule"
regarding logging!
Note that no rule is absolute - there are always exceptions and you need go consider the tradeoffs and apply good judgement.

I also shouldn't take "full credit". I have not invented most of these practices. Some I have learned from developers whose
style I admire. Others have come about as the result of (sometimes heated) discussions with other programmers.
And finally, some have I have learned from my own past mistakes.

Take everything with a grain of salt - but I hope you will find these entries, at least some of them, helpful.

Update 10/15/06: Here are the entries as of today: