Users of custom software systems inevitably run into exceptions—anomalous or exceptional conditions requiring special processing. Even the most thorough testing process can’t catch every single potential scenario a user might encounter.
Our testing process ensures that as many issues as possible are caught before deployment. But we’re not so conceited as to think that exceptions won’t still happen. That’s why every application we develop does more than just log exceptions (aka, errors or bugs) to a text file. And why we continue to iteratively improve and invest in our error handling process.
Why Error Logging is Important
It’s never our intention to put bugs out into the world, but we’re not infallible—unforeseen issues happen, and sometimes they aren’t caught in testing, no matter how thorough we’ve tried to be. In these cases, our goal is to shorten the life of the issue by catching it early and deploying a fix as quickly as possible.
To catch issues that make it to production, we have to have error logging in place. We can’t expect users to report every little issue they experience. Many users just refresh, undo, or close a system and try again, in which case we’d have no insight into exceptions they experience without a monitoring tool like error logging.
We want to know when errors happen because if they happen for one user, they’re likely happening for others. Errors are frustrating for users and can lead to a distrust in the system. We want users to have high confidence in every system we build, and part of that is making the experience as bug-free as possible.
Error logging also gives us an objective metric to measure that, along with other tools, points to the health of a system. When we build a custom software system, we implement error logging right away and track how exceptions decrease over time. When we take over the support of a system, we add error logging as soon as we can (if it’s not already there), review and prioritize issues that arise, and work toward zero exceptions.
The monitoring process is an ongoing effort. Many of our clients are with us for the long term, continuing to make enhancements to their systems year after year. As the code continues to change and the code base continues to grow, exceptions may be introduced, and we want to identify and resolve them as soon as we can. We see that as our responsibility as an outsourced development partner.
Want to work with a development partner who cares about this stuff?
Learn how to find the right-fit outsourcing partner.
Monitoring and reviewing the error logs allows us to proactively stay on top of what’s happening in each of our systems. We appreciate it when clients (or their users) report issues, but we prefer to catch exceptions before they’re even noticed by a user. Thanks to our proactive monitoring, we’re often aware of issues before a client is, so when we inform them of the bug, we’re also able to communicate their options for addressing the problem, giving them the information they need to make the best decision as efficiently as possible.
By integrating error logging into every environment (development, test, UAT, and production), we decrease the chances that issues will make their way into production environments. Before production releases, we review error logs in the UAT (user acceptance testing) environment and fix any issues logged before release.
In the end, some exceptions are out of our control. Browser plugins, dependencies, and other factors can sometimes cause exceptions for users that are due to factors outside of the system. In these cases, it’s vital to have error logging in place because, again, without it, we’d have no insight into these situations. When—thanks to the error logs—we have detailed information about an exception, we can handle it in a way that best meets the users’ needs.
Until it’s possible to generate perfect code, exceptions are going to happen. We’ve refined our development process and integrated practices like test-driven development, automated testing, and using code analysis tools to catch as many bugs as we can during development. Handling and monitoring error logs removes hubris from the picture and acknowledges that issues will happen—and when they do, we’ll know about them and be able to fix them quickly.
Continuously Refining Our Process
We approach exception handling like anything else in custom software: start somewhere and continuously improve.
Here are the different iterations we’ve gone through.
1. Start Somewhere
We started with logging exceptions to a text file or event log. Due to bots and common ASP.NET errors, the logs were overwhelming and nearly impossible to review manually. At this stage, we only reviewed the logs once someone brought an issue to our attention.
2. Push to Proactive
Our next iteration involved making the log review process more streamlined so we could proactively review issues. We updated all the applications we supported to log errors to a centralized exception database. In this central hub, we could query exceptions to pull out the ones we wanted to review, which we did weekly.
3. No More Noise
Next, we created a SQL Server Reporting Services Report that was emailed to support team members each week. This is a seemingly small step, but it was a very helpful one—having a report sent to your inbox instead of having to set a reminder to go out and check the log made a big difference.
We also used this opportunity to exclude certain inconsequential exceptions that were clogging the logs and burying higher-priority exceptions. We started by filtering out 404 (“page not found“) errors, default ASP.NET errors, and other exceptions that are often triggered by bots.
4. More Access, Manual Workflows
In this iteration, our goal was to make exception logs easier to access and to give access to more team members. To do this, we built a simple interface that allowed developers to review and exclude exceptions. We continued reviewing the logs on a weekly basis and would manually create user stories when there was an exception that needed attention.
5. Steps to Automation
We use TargetProcess (an agile project management tool) to manage our work—backlogs, user stories, workflows, reports, and more. In keeping with the Agile/Scrum methodology, every piece of work we do has a user story in TargetProcess associated with it. While we were still creating exception handling stories manually, we built an integration from our custom interface to TargetProcess to make this process more streamlined within our existing workflows.
6. Elevate to Elmah.io
Seeing the increasing value of our exception handling process, we were eager to continue building our capabilities and processes. We thought about building out a homegrown tool but instead did what we recommend to many of our clients: We stepped back and did a buy vs. build analysis.
After looking at the options, we determined it wasn’t cost effective to continue building our own custom tool when there were full-featured software-as-a-service (SaaS) platforms available. One of the tools we had evaluated, Elmah.io, is a SaaS tool for managing software system exception logs. Through our analysis, we determined it met our needs with its features, data security, integration capabilities, and cost.
We bought a subscription and integrated Elmah.io with Microsoft Teams to automatically send new errors to a “Support” channel, but we were still creating TargetProcess stories manually.
7. Across-the-board Automation
Having found a tool we were happy with and moving client systems onto the Elmah.io exception handling platform, we were ready to bridge the gap and build out more automations.
To this end, we created a rule in Elmah.io to send every new error—after filters for common and bot-created exceptions are applied—to an Azure function. That function then automatically creates a story in TargetProcess under the appropriate project and in the current sprint. The team can then investigate and fix the issue and mark it as fixed in Elmah.io to make sure that if the error is reintroduced into the system, we are notified.
8. Find Exceptions Before They Happen
Now that we have our Elmah.io tooling in a good place—with filters for common issues and workflow automations—we’re using the tool earlier in our development process. Previously, error log monitoring automation was only used in production environments. Now, we have the automation set up on UAT environments as well, so we are alerted about exceptions during the internal and client testing process. This provides an additional layer of protection to prevent bugs from making it to production.
9. Continue Iterating
Our exception handling process is in a good spot, and we’re really happy with the results and the path we took to get there. But we continue to reflect on the process and make iterative changes as opportunities arise.
Benefits of Elmah.io
We started our exception handling workflow by building our own tools and queries. But as our needs grew and changed, we evaluated whether continuing to build and maintain our own tools or utilizing an off-the-shelf tool like Elmah.io made the most sense. We ultimately decided to go with Elmah.io for many reasons:
- It’s a specialized, trusted tool that’s good at what it does
- The tool and advanced features already existed
- A third party is responsible for maintaining the tool
- Its security practices met our standards
- We can easily manage all systems and environments in one place
- Filters and rules allow us to easily focus in on items needing our attention
- It offers integration capabilities with other tools we use
- The cost is reasonable
We’re fans of using integrations in custom software when they make sense, and Elmah.io made sense for us. Iterating on our exception handling workflows has helped us better serve our clients in several different ways.
We’re able to identify and fix issues quickly: Without exception monitoring, we’d be mostly blind to errors unless users or clients reported them. With Elmah.io, we can see issues right when they happen and initiate the process of resolving them immediately.
We’ve improved user experience and system performance: Fewer exceptions and more reliable performance inevitably result in a better user experience. By monitoring systems for errors, we can reassure clients we’ll know about issues impacting users, likely even before they do.
We can be more proactive with maintenance: Custom software is a long-term investment that requires, at minimum, annual maintenance. We recommend clients set aside maintenance budgets for upgrades and patches, and, thanks to our exception handling process, we can use those hours wisely.
Always On
In addition to our annual review process with clients—where we look at third-party integrations, backlog priorities, recommended updates, and more—continual exception monitoring is vital to maintaining healthy custom software systems.
We want to be as informed as possible about our systems—both the good and the bad. This information helps us make sure clients’ systems are secure, in working order, and providing a good user experience.
Do you have an idea for a custom software platform? Or do you need a legacy platform updated?
Reach out.