Wens’ blog

An expensive bug

October 12th, 2007 by Wenslauw

Some software bugs are easy to find, while others take days and days to figure out. Especially when you are working on a chain of connected applications finding them can be a pain in the arse. Most of this week I have spent finding a bug in a very big customer relationship management system. The bug appeared when customer orders where shot at the system through its web services. It means that customers subscribing to a new kind of product will lose some of the numbers of their ISDN line.

Problems like this had occurred before with these web services. Usually it was a mapping issue, one of the weak points of their implementation, that messed things up. If one letter changes in the mapping the whole thing goes wrong. So I checked it carefully and made sure all the mappings were correct. There did not seem to be a problem there.

The maintenance guys sent me an example order that was not handled properly and I followed it through the whole system. I checked the incoming XML message for the web services several times and did not see a problem, so I tried to find the problem in this huge CRM system. Nothing turned up. Two nights I went home frustrated, wondering how long it would take to find this bug that was costing our customer a lot of money and their reputation.

Yesterday I checked the XML message again. I looked at the phone numbers in it. These phone numbers are provided without a dash by a back end system and identify part of the customer’s installed products. A front end application tries to find the right position for the dash and puts it back in the number. One can wonder why the system works like this in the first place…

And what do I see? I see a telephone number with an area code of 029, which does not exist in the Netherlands. How do I know? Well, I used to date a woman from the area where the customer with that telephone number lives. If it was not for her I would not have noticed the dash is in the wrong place. So now I know what goes wrong. Now I find out where exactly it goes wrong.

It is really a case that shows good design and test driven development help to make better software. I find the code for putting the dash in the number in a class that gets the customer data through a call to web service code. This code for deciding how a telephone number is represented should not be in there. It should be in a separate class and have its own unit test.

The front end application that has the bug was a quick and dirty job to get rid of a pile of orders at the customer’s side. The pile kept getting higher and they did not feel like filling another call center with a lot of personnel to stop the pile from getting higher. So we quickly wrote some software to do it for them. Unfortunately quality that way quality was thrown out the window. No unit tests. A test team did test it, but apparently they did not find the bug. A simple unit test would have found it…

Posted in Programming

Leave a Comment

Please note: Comment moderation is enabled and may delay your comment. There is no need to resubmit your comment.

last.fm records

  1. Clandestino Clandestino
    Manu Chao
  2. Coney Island Baby Coney Island Baby
    Lou Reed

Categories

Archives

Blogroll

Recent Comments

Spam Blocked