A recurring takeaway from recent conversations I’ve had with network operators and vendor partners is that model-driven telemetry is a useful innovation, but more is needed to source data for performance management that’s suited to virtualized multi-access edge computing (MEC) and 5G network infrastructure, which necessarily is delivered on quality of service (QoS) and quality of experience (QoE) for use cases beyond consumer broadband.
So what is the best way to implement QoS / QoE in these environments, and how do you effectively monitor performance and implement closed loop automation? Let’s start with the fact that, in general, the monitoring and automation around QoS and QoE is expected to come from “telemetry”—but little time seems to be spent on this telemetry data itself.
In 2019, the word “telemetry” typically refers to YANG model-driven streaming telemetry—specified by OpenConfig, pioneered commercially by Cisco, and now widely adopted by other manufacturers including Juniper, Nokia, and Ciena.
Previously, performance data would be polled by SNMP. But, this was:
- Not very granular, accurate or timely; typically, only every 15 minutes to avoid overloading the router
- A heavy impact on router performance, especially if you started sending data to more than one source
- Typically, only delivered a subset of the performance data that you really wanted to collect from the router
Model-driven telemetry overcomes those issues with the ability to expose the entire YANG model of the router over a streaming interface at shorter intervals and typically with less impact on router performance.
At Accedian, we think model-driven telemetry is a great source of performance data, and a long overdue step. We support it on our Skylight performance analytics platform, and we have done implementations for our customers where telemetry data is one of the sources of performance data that we analyze to help determine the root cause of performance issues.
But model-driven telemetry also has its limitations:
- While it is more efficient than polling SNMP, especially at higher frequency and with multiple destinations, it is still not free; implementing it has a cost on the router that is not negligible
- The underlying performance data collection mechanism has not changed; while gRPC streaming might be more efficient than SNMP polling, the quality of the data itself frequently still lacks accuracy and granularity
- You’re still collecting performance monitoring (PM) data at a single point in the network only; tools like Accedian’s Skylight performance analytics can help string those points together to form a more coherent view of network performance, but you’re making a guess on QoE based on those points
So, looking at model-driven telemetry data gives you one piece of the picture. But to really get a handle on QoE you need more: the ability to actually test how your customer traffic flows through the network (not just at individual points), and to capture higher quality vendor agnostic data, the accuracy of which becomes ever more important in 5G environments. This is where tools like Accedian’s Skylight platform come in—giving you the ability to do end-to-end testing, manage QoS targets, understand QoE for customers, and pinpoint the origin of issues when QoS and QoE go wrong.
So, as you move toward SD-WAN, 5G MEC, and 5G network slicing, make sure you’re using a performance monitoring toolset that ensures the QoS and QoE of the underlying service that you are selling. Model-driven telemetry is an important part of this, but it is just one tool in the performance management toolbox.