Lessons learned from the big rewrite

I did a rewrite of a system powering outdoor sports related websites. Before the rewrite, the system was translated in 24 languages, had 6 different widgets that could be integrated in other websites and the codebase was 7 years old. It had white labeling capabilities and powered five websites. The biggest one had about a million users a month. Here is my story:

Our old system was written in Perl in a custom web framework called WebTek created by our lead developer. The decision was made that the whole site should be rewritten in something that more than one person can develop against. So they hired me to rewrite the site in Django.

All in all we switched our framework from Perl/WebTek to Python/Django and our Database from MySQL to Postgres. We normalized the database, removed inconsistent or dead data, revamped the user interface (and introduced Sass) and made it responsive.

The key learnings were:

1. The rewrite takes forever

Max, our lead developer, estimated that he could rewrite the whole system in about 4 months. I was new in the team and had never touched the system before rewriting it. It took me 11 months to relaunch the system.

This was a really difficult time for me. Each time I thought the rewrite is almost done some problems (undocumented features, unknown widgets or problems with data) occurred and moved the finishing line out of sight.

A rewrite is a marathon, not a sprint. One really needs to be very enduring and has to have a lot of patience to finish the rewrite before going crazy or burn out.

You have to walk a long path when you doing a rewrite

2. Migrating the database is hard

Some tables in our Database consisted of millions of rows. At first I thought the migration script will be hacked together in a few hours but I quickly realised, that it is not that easy. There were encoding issues and also problems with non normalized data.

One full database migration run took a few days to finish. Often there was an exception three days into the migration causing the migration to fail.

If you write a data migration make sure you have a creation and modification date everywhere in the old database. If those do not exist, add it to the old system. With this dates you can do incremental migrations or resume migrations in case of errors.

Write the migration scripts as good as you can. This is not some helper script, this is the heart of the rewrite! Make it so that you can see the progress of the migration. Add the ability to resume the migration where the first run stopped/failed.

3. Test the new site for performance

Start this at day one. Add a hidden iframe to the live site that makes the same request to your development system. (ex: www.example.com/some/url/123/ -> dev.example.com/some/url/123/)

With this little trick you can measure the performance of your new system and you will see if it can handle all the traffic. You may also discover requests you did now know about. (maybe some hidden URLs for a special customer or widgets that were custom made for a user etc)

4. Tell your users about the relaunch

This is a thing we had to learn the hard way. If you change the user interface significantly (which we did) do not forget to prepare your user base for the relaunch.

We just flipped the switch and from one moment to the other the new user interface was live. The system is managing bike and hiking routes. A lot of users could not find their routes in the new interface and thought all their data was gone! We also moved the button for uploading routes so some of our users thought it was impossible to upload routes with the new site.

The result was a big shit storm in the user forums and on Facebook. So right after the relaunch we had to deal with some stability problems and some bugs of the new system, we had to make changes to the user interface and had to calm the angry mob our user base has become on Facebook and the user forums. And this all at the same time. The four days right after the relaunch were very intense days. :-)

What I would do differently

The first thing I will do the next time is taking more time to analyze the old system thoroughly and write a technical spec what the new system should do. A simple "What the old system is doing now" as a spec for the rewrite does not cut it. No one knows what the old system is doing exactly so it is necessary to take the time to write a spec.

We did a "big bang" relaunch. So one minute the old system was running, than we flipped the switch and the new system was running. All the new data would be stored in the new database. There was no plan on switching back to the old system without losing data.

Next time I will add a layer to the systems that allows the old system to write to the new database and the new system to write to the old database. This layer might be a lot of work, but it allows running the old and the new system side by side. (A developer friend of mine said no one in charge will approve writing this layer. This might be but a rewrite is not a cheap thing to do. It is very, very, very expensive) Being able to run both systems side by side is a tremendous advantage. You can give your users the ability to test drive the new system before relaunch. It makes reverting the relaunch easy. If the new system does not work as expected, you can switch back to the old system, fix the problems with the new system and then switch to the new system again without losing any data.

Read this!

If you are planning to rewrite your project, there is a lot of information out there:

Chad Fowler did a really great series of blog posts about The Big Rewrite.

Joel Spolsky also wrote a blog post explaining why you never should do a rewrite. It is called Things You Should Never Do, Part I.

Dharmesh Shah wrote an excellent blog post called How To Survive a Ground-Up Rewrite Without Losing Your Sanity. (Thanks to Daniel Farina who pointed me to the post in the comments!)

I can not point out enough how tremendously useful these blog posts are! Please take the time and read everything you can get on the topic before making your decision on whether to rewrite your system or gradually improve your existing system.

If you are planning a big rewrite just send me an email, I am happy to chat: anton@ignaz.at