You've got a product bug, and the pressure is on. Your users are upset, there's a group of grumpy managers huffing nearby, and you're somehow expected to keep your cool and get to the bottom of what's gone wrong.
Fixing production bugs can be scary—and overwhelming. The trick is to create the room you need to get your job done and work methodically. Here are some tricks to help you stay calm and fix even the most terrifying production bug.
Give yourself the room you need to debug properly
Some people may thrive under pressure, but most developers work much better when they aren't stressed out. The first thing to do when faced with a scary production bug is to try create room for yourself to work without pressure. Here are some ways to do that.
1. Acknowledge the issue publicly
Unless the issue you've found is security-related, tell your users that something is wrong. As a user, nothing is worse than not knowing what's happening. Tell them, so they're in the know.
GitHub is a good example of a website with a status page that lets its users know when it's having issues. If you work in some ChatOps magic, you could send a bot a message to give your users status updates.
[ More on ChatOps: ChatOps essential guide: The basics, benefits, and challenges | How to put ChatOps to work in your organization ]
2. Find a quick fix
If you're lucky, you’ll get an idea for something you can do relatively quickly to patch up the issue. If this is the case, don't worry about finding the perfect solution—just ship that fix! Once it's live, you’ll have fewer affected users, and the grumbling managers can go grumble elsewhere while you dig a bit deeper to find out what's caused the problem—and fix it properly.
3. Find a workaround
Sometimes the functionality that's broken isn’t the only way to achieve a task. If that’s the case, use whatever communication channels you have available to let users know alternative ways of getting what they need so they can keep working. The workaround will probably be clumsy and slow, and it may involve doing things offline (e.g., over the phone or in person), but at least you've given them an option.
4. Switch off broken features
If the payment feature on your website is broken, remove the "Pay" button. If the search feature is broken, remove the search bar. Sure, users still won't be able to achieve what they want to, but at least the functionality that remains is something they can have confidence in, and you won't be inundated with new logs and alerts.
If the bug is continually causing damage, such as creating bad data, this is a great way to stop things from getting worse. In the worst-case scenario, you can put a big "we're down for maintenance" message up instead of your home page, and/or redirect to your company's Facebook or Twitter page.
Work methodically
Now that you've cut through a bit of the chaos, you should have more time to focus and get the job done. I find that having a list of things to work through helps me feel more in control. Here's a little list of steps to take when you’re starting to dig into the cause of a bug.
1. Get a second set of eyes on the problem
Two pairs of eyes are better than one, so find someone to come work with you. This will help keep you on track and make it less likely that you'll follow dead ends.
If you're lucky, your partner will know the system as well as you do. But if not, he or she will still help you stay on track—and will be learning about the system, which means there might be less pressure on you next time around. Even if all the second person does is shoo away nosy managers, you'll be grateful!
2. Reproduce the issue
The best way to set yourself up for success is to reproduce the issue, ideally locally. Bugs become much less mysterious once you can make them occur at will. It also means you'll know for sure when you've fixed them, because you'll be able to demonstrate the bug occurring—and then not occurring—before and after you apply the patch.
3. Eliminate the obvious
Taking five minutes to make a little list of the most obvious potential causes can be a huge timesaver, because it's often the things of which we subconsciously say, "Nah, that can't be it," that are to blame.
Developers tend to have very good intuition when it comes to what is causing issues. But they often dismiss their own suspicions without sharing them. Write them down to force yourself to consider them.
4. List all the layers and players
It can be easy to forget a part of a system that you don't often change or think about, such as a load balancer or a piece of middleware that validates a user's cookie. Making a list of the stack that a user request passes through can help you broaden your thinking and find potential culprits you might otherwise miss.
5. Find what's changed
Bugs that have existed for a long time without having a negative impact will likely have come to your attention because something changed. Look at the source control and deployment histories to see what has changed recently.
Are there any suspicious commit messages or deployments? I often find bugs faster this way.
If you don't spot anything obvious, remember to look at your list of layers and players, because the change might not have been made in the code you look after. Ask around to find out if other systems have recently had changes deployed.
6. Think about next time
Once you've fixed a bug, take a bit of time to make changes that will improve life for your future self. Write a test that will catch the bug, should it crop up again. Add the logging or documentation you wish you'd had. You'll thank yourself later!
Bugs don't have to be scary
Fixing production bugs is often daunting because of the way we react to them. It doesn't have to be that way. Remember to start by focusing on doing things that remove pressure from the situation. Then, armed with a list of steps to help you stay calm and focused, you'll fix bugs faster and be less stressed while doing it.
Keep learning
Take a deep dive into the state of quality with TechBeacon's Guide. Plus: Download the free World Quality Report 2022-23.
Put performance engineering into practice with these top 10 performance engineering techniques that work.
Find to tools you need with TechBeacon's Buyer's Guide for Selecting Software Test Automation Tools.
Discover best practices for reducing software defects with TechBeacon's Guide.
- Take your testing career to the next level. TechBeacon's Careers Topic Center provides expert advice to prepare you for your next move.