Application Logging for Developers: 3 Common PitfallsJanuary 04, 2017
The Un-Ignorable Importance of Logging
Regardless of what you call it—event logging, application logging, or just plain logging—logging is one of the most important sources of information for the healthy operation of any IT system.
Information on everything from a user-seen error, to application crashes, to the intermittent 500-error response from a 3rd party API is stored there. Without a log, your entire IT staff is flying in the dark when it comes to issues arising in production.
Early on in a startup’s life, logging doesn’t receive a lot of attention. Your application is small and simple, and doesn’t generate lots of logging data. Your log doesn’t have to be end-user facing, either.
However, as time goes on, you’ll notice a change: You’ll move from two servers to two dozen. You’ll expand from one-log-entry-per-minute to one-thousand-log-entries-per-minute. You’ll rely on it to help your customer-experience team and users to track down issues.
All these new conditions present challenges to scale, but they’re not insurmountable if you pay attention to the shifts and watch out for three common pitfalls:
1. Non-centralized logging
When you have one or two servers to start, storing your application logs on the individual servers isn’t so bad. When something funky happens, you log into both servers one at a time, read through the logs to see what’s going on, and call it a day.
But this process will quickly fall apart as your system scales from a couple of systems to a dozen (or a hundred).
Here at Zoomph, we use Graylog to house all our logging data in one central place, making sifting through the vast amounts of logging data possible, while providing a lot of other great features, like automated alerting and searching.
2. Having no taxonomy
When the system is generating a few dozen log entries per hour and there are only a few dozen types of information being logged, sifting through the logs manually is not a huge deal.
As things scale up, though, having a clear organization to the logs greatly helps with providing clarity and concision to filtering and parsing through data.
Good taxonomy ideas might include tagging severity levels (info, warning, error, etc.), and identifying which users and systems were affected.
Our team uses a taxonomy that includes the following:
- Severity (info, debug, warning, error, critical)
- Affected system/subsystem
- When the error occurred
- Where the error occurred (what server was the code running on)
- Exception data (most of our code base is C#)
- Stack Trace
- Other exception types specific to meta data, like HTTP status codes for a REST API call error
- If applicable, which user/client saw the error
- This includes remote IP, user agent, etc.
- Freeform data specific to what is being logged
3. No automated alerting
When you’re just getting things up and running, you’ll likely find yourself regularly perusing logs to look for anything suspicious.
Sorry to say, but this fails to scale. It becomes impossible for an individual to read through hundreds of log entries per minute, and catch every important signal amid the huge bank of data.
To get around this, you’ll want to set up a system that notifies you when critical events occur. These events are generally divided into two types: one offs, and events that are critical when they happen at a high rate.
One offs involve occurrences like an application crashing, which (hopefully) happens once in a blue moon but requires your immediate attention.
High-rate events are a little bit trickier to handle. If your database is processing millions of transactions per day and one of them times out, this is generally not a big deal. But if a hundred of them time out over a one-minute window, then something is likely wrong.
Alerting can be one of the toughest things to set up properly since on one hand, you don’t want to flood your IT team with events that aren’t really issues (your team will start tuning those alerts out). On the other hand, you need your system to notify you when things go awry so you can act quickly. Finding the right balance is crucial, thought it might take tinkering and continual optimization.
Consider experimenting with Graylog’s alerting capabilities to start—this has helped our team determine our current alerting process and automation.
In summary, logging, while an art for developers, deserves a good bit of thought to ensure that your team has what they need to keep your system in tip-top shape, now and into the future.