Swim, don’t sink, in the sea of performance data

Drowning in a flood of uncorrelated network performance data? Here’s a life buoy.

Mobile phones have become smart devices that do pretty much everything, from streaming video and taking pictures to mobile banking and payments. All that data being generated is information overload for teams that are trying to manage mobile networks. Maintaining sanity is hard.

Extracting useful information from real-time data is challenging for service providers that are managing millions of devices and hundreds of thousands of network elements, generating  billions of performance measurements per day.

With that kind of volume, it’s not realistic to store everything and analyze it later. In order to make decisions, it’s necessary to create some abstractions and reduce the dataset. But reducing the dataset requires knowledge of what we want to achieve, otherwise we risk losing precision and the very essence of the insight we seek. To know what we want to achieve requires knowledge of the data. This is the classic chicken and egg scenario.

How can we segment the data so that we can cut through the noise and build meaningful reports from it in a timely manner?

Finding and using the gold

True, service providers are sitting on goldmines of network data. The challenge is identifying the right places to dig and then finding the gold and processing it.

Data is the new gold, but it can be overwhelming and easy to miss the ability to make valuable business decisions on the data. In this age of Big Data, we tend to believe that by storing more and more information in data lakes and running “algorithms” on it, we can uncover unprecedented insights.

There is, however, no magic to this. The quality of the insights entirely depends on the quality of the underlying data and the ability to correlate that data with other relevant data. In most cases, adding more data to the lake acts more as an anchor than a propellor.

Here are my top three tips on cutting through the noise in order to get valuable information about your network by combining network metadata and active test metrics.

1. Don’t boil the ocean: Keep it simple, stay flexible

Rather than trying to represent everything, consider opting for simple composition data models, rather than relying on stronger inheritance expressing hard relationships. One model can be designed to represent time series data around monitored objects. Another called the metadata model, allows us to loosely represent relationships between monitored objects.

For the purpose of this blog, a monitored object is a thing that generates data over time—a router port, a microwave antenna, a Skylight control sensor, or a weather station, for example. If it generates data, it’s a monitored object.

2. Do (almost) all correlation at query time

In pretty much all data analysis cases, how data is correlated and aggregated to generate insights is not necessarily self-evident at ingestion time. Instead of pre-computing what we thought we needed, it’s preferable to minimize the work we do during the ingestion, applying simple normalization and data cleaning. That way, we can stay flexible when it is time to aggregate the data.

This, among other things, means that we need to be able to defer operations—such as binning, threshold crossing calculation or multi-series aggregations—from ingestion time (when we get the data) to  query time (when the user needs it). The cost and complexity to do this is often quite high, the but return on investment is well worth the effort. Accedian didn’t invent this concept, by the way. Other solutions such as Prometheus and Graphite operate on the same principles.

3. Scale horizontally now sail to better horizons later

In the world of data processing, scaling is to a product what cold is to snow: a major pain, but a required ingredient. I am often surprised that, in this post iPhone era, many technologies available on the market consider horizontal scaling as a second order priority. In my humble opinion, and based on some painful experiences, horizontal scale is one of the few things that you can’t  bolt-on to your design after the fact.

You either bake it into your foundation, or you live in regrets until your next project.

Metadata: Seeing your data like you see your network

A pretty neat result of “Doing (almost) everything at query time” is the ability to use any combination of expert information to segment and aggregate results in real-time dashboards, network monitoring or asynchronous reports.

Metadata in this case represents the “extra” information that can be added to qualify a monitored object. Remembering that monitored objects are things that report metrics, you can think of metadata as stuff that brings additional, non-metric information about a monitored object.

If you had a monitored object named “Montreal-Tokyo” of type “TWAMP”, that test a link between a Montreal and Tokyo for a given service, you could add metadata to qualify this session, such as: source and destination geographic coordinates, the type and name of the edge routers and core router in the path, the name of the customer being served by this network link, and whether this link has microwave sections.  

Another neat result from the “Don’t boil the ocean, keep it simple” principle is that metadata is not baked in the time series data (typically in the order of 100s of TB for medium to large networks) but stored separately in smaller datastore of 100s of GB. Metadata can therefore be added, augmented or modified after the fact, at reasonably low cost.

This information could be derived from inventory systems, an excel spreadsheet, or any other source of information. Inventory systems can typically dump a CSV file with this information, or more sophisticated ones can be integrated with APIs so that we can use this information in a more streamlined fashion.

It is said that a picture is worth a thousands words, so I prepared a video to show how you can use metadata to your advantage to understand what’s really going on in your network. At 30 frames per seconds, this 8 minutes and 49 seconds video just saved me from writing 15.6 millions words. (And thus, saved you from having to read that many words. You’re welcome.)