Blog

5G and Closed Loop Automation: Beyond Model-Driven Telemetry

I spent last week at MPLS+SDN+NFV World Congress 2019 in Paris. It was a great opportunity to participate in discussions with vendors and network operators on a variety of topics including artificial intelligence (AI), machine learning, SD-WAN, 5G MEC, and 5G network slicing.

A recurring message was that virtualized multi-access edge computing (MEC) and 5G network infrastructure in general needs to be delivered on quality of service (QoS) and quality of experience (QoE) to achieve use cases beyond consumer broadband. There was a lot of healthy discussion around how to implement QoS / QoE in these environments including segment routing, and how to monitor performance and implement closed loop automation. In general, the monitoring and automation around QoS and QoE is expected to come from “telemetry”—but little time seems to be spent on this telemetry data itself.

In 2019, the word “telemetry” typically refers to YANG model-driven streaming telemetry—specified by OpenConfig, pioneered commercially by Cisco, and now widely adopted by other manufacturers including Juniper, Nokia, and Ciena.

Previously, performance data would be polled by SNMP.  But, this was:

  • Not very granular, accurate or timely; typically, only every 15 minutes to avoid overloading the router
  • A heavy impact on router performance, especially if you started sending data to more than one source
  • Typically, only delivered a subset of the performance data that you really wanted to collect from the router

Model-driven telemetry overcomes those issues with the ability to expose the entire YANG model of the router over a streaming interface at shorter intervals and typically with less impact on router performance.

At Accedian, we think model-driven telemetry is a great source of performance data, and a long overdue step. We support it on our SkyLIGHT DataHUB IQ platform, and we have done implementations for our customers where telemetry data is one of the sources of performance data that we analyze to help determine the root cause of performance issues.

But model-driven telemetry also has its limitations:

  • While it is more efficient than polling SNMP, especially at higher frequency and with multiple destinations, it is still not free; implementing it has a cost on the router that is not negligible
  • The underlying performance data collection mechanism has not changed; while gRPC streaming might be more efficient than SNMP polling, the quality of the data itself frequently still lacks accuracy and granularity
  • You’re still collecting performance monitoring (PM) data at a single point in the network only; tools like Accedian’s SkyLIGHT DataHUB IQ can help string those points together to form a more coherent view of network performance, but you’re making a guess on QoE based on those points

So, looking at model-driven telemetry data gives you one piece of the picture. But to really get a handle on QoE you need more: the ability to actually test how your customer traffic flows through the network (not just at individual points), and to capture higher quality vendor agnostic data, the accuracy of which becomes ever more important in 5G environments. This is where tools like Accedian’s SkyLIGHT platform come in—giving you the ability to do end-to-end testing, manage QoS targets, understand QoE for customers, and pinpoint the origin of issues when QoS and QoE go wrong.

So, as you move toward SD-WAN, 5G MEC, and 5G network slicing, make sure you’re using a performance monitoring toolset that ensures the QoS and QoE of the underlying service that you are selling. Model-driven telemetry is an important part of this, but it is just one tool in the performance management toolbox.