Blog

How to leverage a network flow matrix for performance monitoring?

A network flow matrix, also known as a network traffic matrix, is a representation of the IP traffic map; it can be used in many ways to troubleshoot performance, monitor performance, and optimize network infrastructure. Let’s take a closer look at all the use cases for the network flow matrix!

What is a network flow matrix?

Here is a general definition of a network traffic matrix: « it is an abstract representation of the traffic volume flowing between sets of source and destination pairs. Each element in the matrix denotes the amount of traffic between a source and destination pair. There are many variants: depending on the network layer under study, sources and destinations could be routers or even whole networks. And “Amount” is generally measured in the number of bytes or packets, but could refer to other quantities such as connections. ».

The network flow matrix is used to display the geography of network traffic between host groups: the most common network flow matrix shows the quantity of traffic sent from one IP subnet to another.

Nevertheless, a network flow matrix can use other grouping criteria (e.g. VLAN) and other metrics than the traffic volume (number of packets, sessions, performance metrics, etc…).

Although network flow matrices are mostly used by network engineering teams to drive network optimization, design, and anomaly detection, the use cases for a network flow matrix correspond to very distinct situations.

A network flow matrix
A network flow matrix

The different use cases for a network flow matrix

Here is a list of common uses of flow mapping for:

  • Traffic volume and geography analysis

Network engineers and architects need data to drive their design and optimization decisions. In complex and highly distributed networks, keeping track of all types of network usage is a complex task. A network flow matrix is a way to represent this complexity in a simple way.

  • Performance monitoring

A network map can also be extremely useful to troubleshoot performance degradations, provided it can display metrics other than volumes; for example, a network flow matrix showing network and application performance indicators such as:

  • Packet loss, retransmission, TTL expired
  • Network latency
  • End-user response times (EURT)

These indicators can be an extremely powerful approach to accelerating the resolution of slowdowns.

  • Security monitoring

Finally, network traffic matrixes are greatly helpful when it comes to monitoring threats through the network traffic.

1. Capacity planning

To make sure the network infrastructure is offering capacity which is in line with demand, network teams need to have a constant view of who requires what capacity or bandwidth for which application/usage.

Capacity planning with a network flow matrix
Capacity planning with a network flow matrix

2. Detecting anomalies

In case of bandwidth hogs, misuse, or unplanned bandwidth requirement, the network team needs to be able to easily locate where the excessive demand is coming from to be able to mitigate its impact on the other network applications (i.e., stop, delay, compress, optimize).

3. Migrating infrastructure and change management

When planning the migration of important infrastructure (e.g., data center move, change in key network devices like routers and firewalls), network teams require complete visibility of:

  • Who is communicating with whom?
  • Who is using common services, who is not (e.g. DHCP, DNS, …)?
  • What are the dependencies between servers taking part in an application chain?

This data is mandatory to ensure that the new equipment will be configured appropriately and that the migration will not generate any outages or performance leaks.

In addition, a network flow matrix can be used to identify any configuration that still requires an update. Here are two examples of patterns that can be recognized easily with a network flow matrix:

  • Systems trying to communicate with hosts in deprecated IP subnets
  • Flows ending up in error, not reaching their destination (e.g., by showing one-way flows, mapping the ICMP error message data)

You may be interested in reading further information on this topic in this article: “How to mitigate the performance risk of data center migrations”

4. Spotting performance gaps

A data center-level network flow matrix that shows which traffic flows are impacted by a packet loss increase or a network slowdown can save hours in a troubleshooting operation. Instantly pointing out the impacted source/destination pair(s) enables network administrators to focus their attention on the right network paths and sets of devices. If you are interested in this topic, you may be interested in this specific article “how to handle IT performance complaints”.

Performance gaps with a network flow matrix

5. End-user experience mapping

Monitoring where users are having a bad experience when accessing applications—and for that purpose being able to compare the performance rates with all the other user groups and datacenters—helps IT operations team focus on performance gaps and pinpoint the root cause of application delivery failures. To learn more about this, you should read our section on Real User Monitoring.

End-user experience mapping with a network flow matrix
End-user experience mapping with a network flow matrix

6. Identifying changes in the network traffic pattern

Having a baseline of the geography of the network traffic helps immediately to pinpoint where the traffic pattern has changed in a complex environment.

7. Tracking viral infections

Keeping track of machines communicating to non-existing / non-routed subnets can help identify machines infected by viruses and worms. The network flow matrix provides a single view for identifying such patterns.

What are the prerequisites to build a network flow matrix?

To build a usable network flow matrix, you need to make sure your network monitoring solution is capable of :

  • Showing a wide-angle view of network traffic from all user locations to all data centers (i.e., no layer 2 or 3 filtering)
  • All data center traffic is also represented (including the east-west traffic carried inside virtual and cloud environments; see “Best practices for performance troubleshooting in a virtual / cloud data center” for more information on this)
  • Scales to handle the traffic load and renders the data fast enough
  • Offers sufficient retention times to provide the ability to build a baseline and compare normal and abnormal periods
  • Provides a complete set of metrics (not just traffic volume, but also performance indicators)