“Insidious bug” is a term that I use to refers to software defects that are invisible to the end user, but are quietly causing problems behind the scenes. These errors often occur when a normal bug was either trapped and ignored, or fixed by treating the symptom and not the root cause. This is also known as bug masking. I refer to them as insidious when the masked bug is additionally corrupting data.
In my opinion this the most severe type of software bug – far worse than a fatal crash. Users may put in countless hours, weeks, months of data entry before a problem with the data is discovered. Discovering the cause of insidious bugs can be extremely difficult because there are no indications that the code is failing. In some cases data may not be recoverable because it was never saved correctly. Corrupt data can directly cause any number of problems including financial loss, lawsuits, etc.
Keep reading for my personal thoughts on causes and solutions…
Causes of Insidious Bugs
Of course there are various reasons as to why bugs occur such as inexperience, sloppy coding, rushed work, etc. However there are two primary causes of insidious bugs. The first is an attempt to make the software appear solid by hiding error messages. The second is solving the symptom of a bug without understanding the root cause of the problem.
Hiding Error Messages
Good programmers do not want their software to crash. However, over-confident programmers sometimes take the attitude that “my software will never, ever crash under any circumstance.” It is a noble goal to make your software as solid as possible and no programmer ever wants their users to experience a programming error message. However, suppressing errors actually has the reverse effect because it makes software more difficult to test and serious errors are likely to slip out the door.
One technique that is prone to abuse is the use of global error catching. In most languages it is possible to surround the main entry point of an application with a try/catch which will give you a chance to handle every error and allow the application to keep running. I am not discouraging anyone from doing this, however it’s important to understand that when you trap large blocks of code that even the most simple typo can escalate into a major bug. If nobody sees an error message then a bug is likely to slip through the cracks. Often times the global try/catch will include some type of logging instead of showing an error to the user. However, it does no good if the log is never checked. Even if the log is monitored, the customer may be losing data every minute that they continue to work.
Treating the Symptom
The second cause is treating the symptom of a bug without solving the root cause. This frequently happens when a fatal crash is patched quickly in response to an emergency and the QA process is rushed. Everybody from programmer to QA to user sees that the symptom (fatal crash) has been solved. The root cause of the problem not only still exists, but is now hidden. The software appears to be fixed, but in reality has a much more sinister problem that will not be discovered until somebody notices the bad data.
A typical scenario is when an unknown condition happens in a program. For example, the application crashes because it encounters a NULL value where one was not expected. Instead of discovering why the NULL value happened, the programmer simply adds a conditional statement to check for NULL and then set the value to a default. This solves the problem that the NULL value creates, but meanwhile there is some bug deeper in the code that is creating that situation.
Once a poor patch like this is checked into the code base, it can become extremely difficult to locate the true source of the problem because the software appears to operate normally.
Solutions for Insidious Bugs
The best solution for the insidious bug is to adjust your way of thinking so that you do not allow them to occur. This is easier said than done, however there are three practices that I recommend that you follow to prevent them. The first is to stop suppressing error messages, the second is to use unit testing correctly and the third is to use exceptions to discover root problems.
Stop Suppressing Error Messages
First is to let go of the fear of showing users an error message and allow your application to terminate when it encounters an unknown state. To the novice programmer this may sound like a lazy, sloppy attitude however I can assure you that it is the very opposite. Consider which of the following phone calls you would prefer to receive:
“Hey fumbly, every time I enter a negative value for the amount the program crashes”
or this message:
“Hey fumbly, we went to run our month-end reports and the numbers are totally screwed up. We looked at the data and it appears that all of the debits are zero. What happened to the data, do you have backups?”
The first call definitely indicates a rookie mistake, however unless you are trying to switch careers, you’d probably prefer it to the second call. In a case like this, though, you probably would never even have received call number one because such an obvious problem would have been easily caught during QA.
Use Unit Testing Correctly
The second solution is to use unit testing correctly. This can be a great help at discovering problems, however the key word here is “correctly”. Your unit tests are only as good as you make them. It’s easy to get your unit testing code coverage up high by simply calling every function without carefully checking the output. That type of coverage will help you discover situations that cause your application to crash, however will not reveal insidious bugs.
To use unit testing correctly you need to first call a function and then check the data directly to verify that everything is as expected. If a function is supposed to calculate the interest on a loan, you need to supply it with parameters where you know what the correct output should be. Then you must check that the function returned the correct value. If the function persists data to a datastore, you should directly query the datastore in your unit test to verify that the values have been persisted correctly. Furthermore you need to do this with every condition within the function and make sure the values are always correct in every instance.
If you have excellent unit tests, it is more difficult for a programmer on the team to create an insidious bug. Though they may try to cover up an error message, the unit tests should reveal that the data is not persisted correctly. When you discover a programmer on your team routinely patching up code this way, they must be publicly humiliated! My personal preference is to implement fines for breaking builds and unit tests. The fines go into a jar and are used to buy alcoholic beverages for the team after a successful release.
Use Exceptions to Discover Root Problems
Learning to discover the root problem of a bug instead of treating the symptom is the only guaranteed way to prevent insidious bugs. This is easier said than done and should be a life-long goal. However, one simple practice of using Exceptions to handle unknown situations can help to reveal underlying bugs.
A common cause of insidious bugs is overriding illegal values without understanding why the situation occurred. Earlier I mentioned the example of a function that failed because it encountered a NULL value. Perhaps the programmer solved the problem with a line of code like this:
// val cannot be null. i wonder why this happens? oh well! If (val == null) val = 1;
It may be the case that this function should be able to handle null values. However, if you do not know exactly why val == null, this line of code can be very dangerous. val might be null because another programmer forgot to set a value in an earlier location. In which case setting val = 1 solves the problem within this function, but 1 is probably not the right value. So this particular solution will mask the earlier bug and there is a very good chance the program will go out the door with an insidious bug.
This looks more severe, but is actually a safer solution:
// val cannot be null. let's make sure somebody notices the problem
If (val == null) throw new Exception("val cannot be null ever!");
Depending on your code, the application may crash when this line of code is hit. At first you may think this is a bad thing. However, consider that the reason your program encountered this state to begin with is because there is a bug somewhere in deeper your software. If you simply ignore and override, you are simply allowing the deeper bug to remain undetected. Once you add this Exception, there is an excellent chance that the real bug will be discovered either a) as soon as you run the application, b) in your unit tests and c) during QA. In this case even poor unit testing will detect the exception.
The absolute worst case scenario is that despite everything, your application goes out the door and a users experiences an error message or fatal crash. As much as we want to avoid this, it’s still better than allowing the user to continue working when their data is not being saved correctly.
If your customers are the ones finding these errors regularly, it’s an indication of a poor QA process more so than poor development.
In Summary
Just to wrap it all up, I think a lot of insidious bugs are created by programmers who actually are striving for solid software. I encourage you to consider what “solid” really means. We never want our users to see error messages and crashes, however simply hiding them is not the solution. Sometimes making an error more visible will allow the root cause to be solved more easily before it is released to your users.
As always comments and feedback are appreciated.
Daniel Burge
July 28, 2009 at 4:09 pm
Great article. The times that I’ve seen these types of bugs are in areas which deal with data type conversions. For instance decimals being rounded to the wrong place. After summing up this can be a big problem that might not be noticed on an item by item basis.
Daniel Burge