An aspect of software engineering that is often under-appreciated is the task of debugging an issue.
A lot of online material discusses how to build things from scratch, but very few talk about addressing a bug report in an existing system. In this edition, I'll dive into my approach to debugging and some tips & tricks that have helped me squash many different bugs.
Before fixing any bug, the problem needs to be understood. It is often the first mistake that can be made when debugging: assuming too early that the root cause has been identified. It's easy to put little patches on top of issues you have, but do this a lot, and it'll blow up in your face, figuratively speaking.
Let's take an example: if you're working on a piece of UI, and there's some weird text overlap between a title and a body of text below -- how do you fix this bug? The initial solution may be to hard-code the height of the title UI element so that it doesn't overlap, bug fixed! Not so fast, though -- because what happens now when large fonts are enabled for accessibility purposes, the text now gets cut off within the title UI element. You can't just hard-code the height to be bigger, though, because the font is variable, but removing the fixed height will break the previous issue you fixed. So you now need to re-fix the original bug differently to fix this newer issue.
In the above example, it was easy to think: "Oh, there's not enough height here; let me add a height constraint to help." Without thinking a step further about other details such as accessibility, though, the identified solution will not work in all use cases, and you'll have to fix this issue multiple times. However, you can fix a bug just once by taking an extra step and thinking through potential edge cases.
What about when the problem is not so easily identified? Overlapping text is obvious to spot; what about an issue in notifications not being received by your users, where the notification pipeline involves more than three different distributed services owned by other teams. Well, you start by gathering information to understand the problem better.
In a complex distributed environment, debugging becomes akin to solving a complex puzzle, where you know there has to be a solution -- the system worked last week! -- but you can't quite see it yet. You have to poke around the pieces, shift them around a little, to make sense of it all. The same is true in debugging.
Especially for systems without UI, console logs are your best friend. Having a log record and quickly navigating through it to search for clues is essential to rapid debugging. Relatedly, though, is the need to write out logs in your code! In many cases, it's hard to know what debug information you may need, though, when first assembling a new service, so one of the best tricks to use when debugging is to start adding logs that will help you for the particular issue for which you are searching. Then, once that issue is fixed, remove logs that you clearly see are specific to this issue, but leave anything that could be useful down the road.
In some systems, there are a lot of logs, though. So how can I find the specific lines I'm looking for through all of this mess? Add logging with narrow prefixes so that you can search by those efficiently. My favorite technique is to prefix logs by the class+function name they are in, as this helps me dig through logs as if I were digging through the code itself by searching for specific functions.
Sometimes though, debugging is going to get frustrating. I've often not seen any way a bug could be happening; my judgment clouded more and more by my anger. If this happens to you too, what's worked for me is taking a break in those situations. I'm not just talking about a 5-10min break either; I'm talking about taking the rest of the day, not explicitly thinking about the issue, and coming back to it fresh the following day. As a morning person, this helps me re-focus on a problem when I'm at my absolute best, which is typically a different mindset than after debugging for a few hours without making much progress!
One challenging aspect of debugging that's hard to quantify is visualizing potential reasons for an issue. This requires a deep understanding of both the software you have built and the hardware that the software is running on. Some bugs may only be reproducible on specific types of hardware! For example, the high-end machine you built the software on has enough memory to handle your code, but a cheaper device consistently runs into out of memory issues. Without understanding the hardware memory model, it would be hard to think of this possibility when testing on the high-end machine you built the software on! In these situations, you have to ask: how could this issue be happening, and what about my setup is making it not reproducible for me?
A final area to cover is debugging tools, of which there are many. These can be very helpful but don't replace all of the thinking necessary to get to the root of complex issues. A debugger that allows you to step through the code line by line and print out values of various variables along the way is an invaluable tool, as long as it's stable -- you don't want to have to debug your debugger as you're debugging a production issue!
Overall, debugging simply requires some creative thinking and perseverance, and any bug can be fixed -- it just may take some time.