Reducing noisy alerts in an organization is crucial for improving productivity and focus. Excessive, irrelevant notifications can overwhelm employees, leading to alert fatigue, where some important messages might be missed or ignored. Through ServiceNow’s Event Management (EM) module, you can minimize unnecessary alerts, and teams can concentrate on critical issues, improving response times and decision-making. A streamlined alert system leads to a more organized and efficient work environment, enhancing overall operational effectiveness. Some of the key features of EM that we will discuss here work towards tackling noisy alerts issues.
This is the first configuration point after receiving data from monitoring sources where we can apply any noise reduction logic.
Drafting of the Alert message key (Compression):
- Duplication of alerts
The message key is the value used to determine if an event coming from a source already has a matching alert or should create a new alert. If a matching alert exists then we just want to update the existing alert. Thus the message key should reflect an accurate identification value of the event. This way duplicate alerts are not created for the same issue. A too-generic message key will end up compressing many alerts together and cause many issues to go unhandled but a too-granular message key might cause duplication of alerts. A good standard to use here is to compose the message key of (name of the host/entity of the alert + type of alert being received)
The example below shows a few examples of various alerts that has thousands of events mapping to them.
- Auto Closing of Alerts:
With a correctly drafted message key, an alert will be updated from incoming events from sources. So if a monitoring source is capable of sending closing events, then the incoming event will map towards the existing alert in the system and close it, which reduces the overall number of alerts drastically, because there is no reason for a human to check on Alerts that are no longer issues. You might be thinking that some issues might go unnoticed if an alert is constantly flapping between closing and opening but in that case, the alert stays open in a flapping state for diagnosis & triage. So then a correct message key drafted is a must to get value out of closing events from the monitoring source.
- Ignoring Informational noisy data:
Through the use of Event rules, we can draft rules to ignore any event types that we deem to be not helpful or are just informational. We can also choose to enrich the data, parse it, and extract meaningful information from the source format into EM alerts. In the below example Pod status events that come from the ITOM agent will be ignored and not processed, but choosing to ignore events that match this filter in the event rule.
Servicenow’s correlation engine offers several grouping methods to streamline incident management. By consolidating similar repeated alerts, the engine reduces noise and helps identify root causes more quickly. The grouping method is denoted inside the group field, along with a description of the reasoning in the alert. Below, we will quickly discuss the various methods and their use cases.
- CMDB:
CMDB-based grouping of alerts uses the CMDB to organize and correlate incoming alerts based on their associated Configuration Items (CIs). When an alert occurs, EM links the alert to its relevant CI in the CMDB. Alerts linked to the same CI or related CIs are grouped together, providing a clear view of how specific infrastructure elements are affected.
This grouping helps identify the root cause of issues by analyzing the relationship between the affected CIs, including dependencies and service hierarchies. Additionally, by grouping alerts based on the CMDB, IT teams can prioritize responses more effectively by focusing on critical services and understanding the broader impact of the alerts across the business. This approach enhances efficiency in managing incidents, reduces alert noise, and accelerates incident resolution.
- Pattern Based (Automated):
EM is constantly looking for patterns in incoming alerts based on which alerts have fired during the same timeframe. By going to “Learned Patterns” you can view the patterns EM has detected. Using these patterns, EM attempts to group new alerts that are occurring that match a historically detected pattern. The example below shows different patterns identified. Each pattern has an associated score based on how many times this pattern occurred and how many CIs were part of this pattern. This method helps us identify relationships between occurring alerts that might seem like they have nothing in common but, in reality, are related through some other relationship ServiceNow & the data has no visibility into. This is an excellent feature of the EM’s correlation engine that improves with time and human feedback. While it can take some time to produce great value, it can be extremely efficient in reducing noise and identifying root causes.
- Tag-Based:
If your organization has a robust tagging strategy for its cloud infrastructure but has not yet implemented tag-based Service Mapping or a mature CMDB, then this method is a great tool to help correlate alerts together based on the alert & CI tags. By defining these rules to look for specific tags within the alerts you can create groupings of alerts on the same application, same environment, or other various tags together. And by default, the most common rules are created OOB to be used from day 1.
- Correlation Rules based:
In addition to the OOB methods of grouping, you can also define your own logic using the correlation rules. These are manually defined rules that use filter conditions or scripting to create groupings of the alerts.
In some instances, EM users need to identify specific alerts that shouldn’t be correlated with other alerts due to their importance. Filtering these alerts from grouping is crucial for immediate triage and escalation. This is where an Automatic group filter is utilized; it is simply a condition for alerts that shouldn’t be considered in the correlation engine.
Maintenance rules are a good method for suppressing alerts occurring because there is ongoing maintenance on the host. They work to list all Configuration Items that are currently in maintenance. Then if any alert from any monitoring source is received for a CI that is in maintenance, the alert gets flagged as such allowing for the operator to ignore them and treat them as noise.
- Default Maintenance Rules: OOB there are 2 defined maintenance rules
Think of these as actions you can take on alerts to remediate the issue. These can be in the form of flow designer subflows and can be set to execute automatically if an alert occurs matching a certain condition. With these rules & subflows, automation can be created to handle alerts without any human intervention. Fully utilized this can eliminate a huge percentage of alerts in your environment leading to a noise-free alert management system.
Interested to learn more or troubleshoot event management? Reach out to us at chat@rapdev.io