Friday, September 17, 2010

New blog home

I've migrated my blog from the old Sun Microsystems blogging site ( to this new home on Blogger. It turned out to be a bit trickier than I had hoped, and as a consequence, there are probably some broken links and some bad formatting here and there. The old blogging platform was based on Apache Roller, and most other blogging platforms don't support import of Roller archives. I first tried to simply set up an account on, but they don't allow new users. In case anyone else is trying to make a similar migration, here are some quick notes on what I did:

  1. I exported from Roller into a MovableType format. 
  2. I used the project to convert the movable type file into a file suitable for Blogger import.
  3. However, all the dates were broken; they all showed up with today's date. I tracked this down to the fact that the converter filter assumes the times in the archive are in 12 hour format (with am/pm at the end, and for me they were not.) So I fixed this by replacing the following line in google-blog-converters-r89/src/movabletype2blogger/ :
    return time.strptime(mt_time, "%m/%d/%Y %I:%M:%S %p")
    return time.strptime(mt_time, "%m/%d/%Y %H:%M:%S")
  4. The biggest challenge was dealing with the images; the blog converted okay but still referenced all my image resources on To fix this, I first uploaded all my images to picasaweb and made them world writable. Then I looked at the page in picasaweb which shows thumbnails for all the images in the album, and I saved it. This file contains image links. Unfortunately, the images all end up with many different url prefixes, so it's not as simple as just replacing the old image prefix with the new one. I wrote a short Java program to extract all the image links, and create a map from file base name to the full url. 
  5. I then wrote a simple Java program to rewrite the Blogger import file such that it replaces all urls of the form with the corresponding new Picasaweb url using the above map. I also took the image urls from the thumbnails and removed the /s128/ portion of the url, which gets you the original image rather than the thumbnail. (Also, it turns out picasaweb insists on converting all .png images and .gif images to .jpg, so the URLs have to adjusted for those cases.)
  6. I also did a little bit of post processing on the results; for example, I collapsed some repeated br tags that are no longer necessary, and I inserted a "This blog entry was imported and urls might be wrong"-warning at the top of each imported post.
  7. One final tip: I discovered that a number of my images were missing. It turns out that Picasa by defaults hides small images (of which I had many) - so these were not uploaded! There are places to both go and enable these as well as adding .gif to the list of included file extensions, so handling this is easy once you're aware of the problem.
Hopefully this helps anyone else wanting to make a similar migration. I have the Java code for the image url manipulation if anyone wants it (but it's not generic so you'll need to do a little massaging for your own needs.)


  1. Nice work. Too bad blog portability is in such sad shape.

  2. Hi Tor,

    Great writeup. I have been putting off migrating my old JRoller based blog to Blogger because of the image issues. Would you be so kind and release the sources to your little helper app?


  3. I also moved - but just the subscription - the move a lot easier for me than for you. ;-)

  4. man - man - man... and you would think that in 2010 these 'formats' will be a bit more easy to handle, no?
    From my experience it's very painful. Each time I needed to take over wordpress and move it to our ( platform it took us more then 7 steps.
    May be, (I know... very low probability) in the next 3-4 years we will see more usage of formats that will make our life easier.