Accedian is now part of Cisco  |

Avatar photo
By Jay Stewart

Going beyond basic KPIs: using active testing and monitoring to get more from your performance data

In today’s society, everything is about data. All components in today’s networks are producing data, so there certainly is an abundance of data, but that does not mean all data is valuable. Firstly, it is like drinking from the fire hose – the data source is coming from everywhere. The second issue is that a lot of times data produced exists in a silo. Even as providers move towards a data lake architecture, data may be in a central location but it is still silo’ed as there is no relationship maintained between different data sets. 

To reduce operational costs and become more proactive, providers are moving to automation to allow for faster rollout of services. With this as the end goal, data becomes more valuable, enabling decision making and automation to meet the targets. So, as we move forward, it’s not about data or basic KPIs, but about how to use that data to create actionable insights. Another key point to keep in mind is that data can be used in many different ways depending on the function needed by the provider.

Generating quality basic KPIs with active testing and monitoring is a start

Producing basic performance data is achievable through active testing, usually, which is done from point A to point B. This type of testing can produce basic KPIs to show packet loss, packet delay, and inter-packet delay variation. We all know that these KPIs are very important and the foundation for monitoring the performance of a network or service. These KPIs are very small points of a larger network, and it’s unrealistic to create an actionable event from a single KPI between two points in the network. This would create way too many events, but it could also create events that really are more of the symptom of a problem and not the problem itself. 

Another thing to think about is that a basic KPI could be used differently to represent different needs for different functions. For example packet loss could be used by the NOC for troubleshooting, engineering and planning, but one could look at the packet loss trends to understand the overall quality of the network as well.

Different teams have different needs in terms of how they view and use network and service performance data; however, they should not require a separate tool to do this.

A key requirement for data analytics and visualization solutions is the ability to take performance data and enrich it to provide more valuable insight for multiple user groups. These groups could vary from the NOC (how can I quickly determine the origin of network and service issues?), engineering and planning (how can I understand network performance over time? Is it improving or degrading?), and product management (is there data to allow me to market services?). All of these groups can use the basic KPIs but those KPIs need to be enriched and presented in a way that is meaningful to their functions to generate insights that are actionable.

So how can you take basic KPIs and make them useful with actionable events and satisfy multiple groups? This can be done by enriching data.

What does enriching data actually mean? The enrichment of data means that you combine the intel from multiple sets of data to create a new piece of data.

One way of looking at the outcome of this can be explained with the old saying that “one plus one equals three”. 

Generating true value by enriching the data

  1. Take different KPIs and combine them to get better insights into performance. If you take packet loss, for example, you might have a single link that has high packet loss, but what does that mean? We know that packet loss can cause service issues, but it will not tell you if packet loss is the issue or the symptom. One classic example is when you look at packet loss and utilization. There are a lot of times that packet loss is the symptom of a link that is over-utilized. By combining these two KPIs it can give you insight that there is in fact a problem, and that utilization is causing the issue and packet loss is just the symptom.
  2. Metadata allows you to represent a KPI with additional intelligence. If we go back to the active test between points A and B, the test produces packet loss, packet delay, and inter-packet variation. Now I have these three KPIs associated with points A and B, if I now use metadata, I can enrich those KPIs to understand the relationship between region, customer, market segment, location (latitude and longitude), and the path between points A and B. This allows other KPIs to start grouping with A and B KPIs. This is important in order to be able to quickly look at a potential cluster of performance issues and understand the relationship of performance across the network and not just at a single point. So if we take the example above where utilization is causing packet loss and enrich that with metadata, you can start to understand how many points are impacted by high utilization. This enables you to not focus on each individual point, but rather see the bigger issue that might be upstream, allowing you to look at a cluster of events and manage it as a single event.

Bringing this back to different job functions, enrichment also helps enable different user groups to gain much more value from the data. For the NOC, this would allow you to pinpoint the cause of an issue. For planning, this would allow for a broader view of macro network performance based on criteria like region or a particular customer. Product marketing, for instance, could more use this information to target market segments.

Operationalizing your enriched insights

Enriching data also allows you to use best of breed tools. Enriched data can be viewed like an onion: you can keep layering on different enrichments that provide different, more precise, insights and actionable events. It is important to have the ability to build upon lessons learned and have a flexible way to enrich the data based on previous takeaways.

Now that we have talked about the importance of enriching data, it is also important to keep in mind how to operationalize that enriched data. It is important to have an open system that allows for the ingestion and distribution of data. The industry has come a long way in terms of openness, as is evidenced with the ability to leverage Restful APIs, and more importantly, use OpenAPI. Another important factor is the use of data busses, or the ability to publish and subscribe to topics in an open environment, which makes the use of enriched data much more automated and operational.

Skylight performance analytics provides a flexible easy way to enrich data, and is open and can fit easily into provider ecosystems and platforms.