Monday, September 12, 2005

Performance Tuning: Efficient CSS Handling

WARNING: This blog entry was imported from my old blog on (which used different blogging software), so formatting and links may not be correct.

We have an update to Creator 2 Early Access which should go out any day now. One of the big things we fixed in the update is closing various memory leaks which caused Creator to slowly run out of juice. There are lots of other important improvements too - for example, Creator will finally run on JDK 5.0. But one big thing is still missing before we can ship it: Speed tuning.

I finally got done with my other 2.0 tasks last week, and have now started the performance tuning work in earnest.

It's really fun! I perform various tasks, look for hotspots, study the code and decide if (a) the code is doing something stupid, and (b) if the code is getting called too often. I've found several easy things to fix, so I've integrated a couple of things, sent e-mail to some other people when the bottlenecks are in their code. And in one case filed a NetBeans issue with my analysis.

The right way to performance tune is to measure the code looking for hotspots, rather than using your intuition about code you've written which is probably slow. So that's what I did - and I measured an enormous bottleneck in the CSS handling code.

This bottleneck has been there since 1.0. However, the bottleneck is proportional to the number of rules in your stylesheets as well as the number of elements in your document, so it doesn't show up for most pages created with Creator 1.0. However, in Creator 2, with our new nice themes, we have really large stylesheets which trigger the slowdown. And if your document has a lot of elements, which is pretty easy now with some of our new complex components, the slowdown is significant. As pointed out by the profiler.

This performance bug was really fun to analyze and fix, so I'll explain the gory details to you.

The problem is this: For each element in the document, we need to figure out which rules apply to that element.
This was done by iterating over the rules, checking the selectors in each rule to see if they apply to the current
element, and if so, "apply" the rule to the element.

The problem is that in Creator, the new Theme stylesheet contains nearly 800 rules! And of course, most of those rules
contain multiple selectors, each of which must be checked! Then there's the possibility of additional stylesheets.
There's the builtin user-agent stylesheet (which assigns HTML semantics for the element) as well as the project default
stylesheet, and any stylesheets you've added yourself. Finally, some of the selector checks involved iterating up
node hiearchies (for example, the :link pseudo tag means we need to see if the element is enclosed within an <a> tag.)
To cut to the chase: some of the node iteration and selector check code would get called over a million times for a page render!

Clearly, this needs to be optimized. There is a straightforward technique to handle this which apparently
is used by Mozilla's layout engine. I learned about it from reading David Hyatt's blog.
That's right boys and girls - spending time reading blogs CAN benefit your daytime job!
Give yourself a pat on the shoulder and resolve to return to my blog regularly :)

The technique is as follows: preprocess the stylesheet when it's first loaded, by setting up some
extra datastructures which will make the later (frequently called) "find-rules-for-element" perform faster.

Consider the following sample rules:

.Tab1Div table {border-collapse:collapse}
.WizMst {background:#C6D6DF}

The first rule can obviously only match Elements where the tagname is table, and the
second rule can obviously only match an element that has a class attribute which includes WizMst.

Therefore, when I preprocess the stylesheet I split up the set of rules into four categories.
First, for any rules where I can identify a "tag name requirement", I add the rule to a hashmap
indexed by the tag name. Then, for any rules that have a "class attribute requirement",
I place these in a hashmap indexed by the class attribute. Similarly for the element id condition.
Then I have a list of all the "remaining" rules, which don't have obvious tag/class/id requirements.
These will need to be checked the old way against all elements.
But amazingly, out of the 800 rules in the theme stylesheet, only 5 fit into this last category!

Now, whenever I need to compute matching styles for an element, I first look up its tagname,
check for applicable rules in the hashmap, then do a full selector check on each one of
these rules. Then I check the styleclasses in the class attribute with the class hashmap, and
so on. Finally, I do the old-style iteration of the "remaining" rules. This effectively partitions
the rules up into small buckets, one for each of the tag names, and one for each of the attribute names,
so for each element I get to do a lot less work.

This does however add one complication: cascade order. I cannot simply process all tag-related rules
followed by all class-attribute rules. Later rules in the stylesheet should "overwrite" earlier
rules (modulo various cascade rules etc.). So after matching rules from my hashmaps and
the remainder-list, I need to sort the rules into the right cascade order again.

The first thing to do was to verify that the optimization works - that pages still render correctly.
They appear to. Check. Then I measured the performance differences to make sure I actually get an
improvement and remove the bottleneck. To make it easy to check,
I put the new code path under a global flag I could turn on and off. Then I ran style attribution
with the optimization on, then with the optimization off, and compared the results.
And the effort definitely paid off.
For smallish pages, style attribution got a speedup of 12x. For large pages, I got a speedup of 8x!

Note that these speed improvements are not in EA Update - which has been out of our hands for a while now.


  1. A couple comments:
    1) I'm very thankful that the update will soon be available, and that the performance issues are being addressed.
    2) As important as desgin time performance is, run time performance improvements are of even greater value. So, to go along with your desing time improvements, how about looking at some run time improvements as well?
    For example, I would really like to see some sort of caching mechanism added for images.
    Here is a scenario: Say a page has two sets of images, yet only one set needs to be updated based on a selection of a list box that corresponds to one of the image sets. The way it works now with only URL values assigned to buttons to specify thier image, when the URL gets set (even if it is the same as the old one) the code calls to the server to refresh itself. If you also had the ability to set the image to a cached object (another button's image for instance) then this call wouldn't need to take place.

  2. Hi Tor,
    This might explain the example I'm trying to convey better:
    In short, is there a way to do image caching with what Creator provides?
    Maybe there is a way to do it, but I cannot see how given that the images are set in the buttons as a URL and do not give you a way to set them from another (in memory) image.