Every IT Operations team faces complaints indicating that the network is slowing down or delivering applications poorly. Their very first focus is on verifying network performance factors (i.e., latency, packet loss, etc.).
Nevertheless, the network is not the sole driver of data transfer speed and of the end-user experience.
Network Performance Factors
Many other factors directly impact how fast application queries and responses will flow through the network. If one wants to troubleshoot performance degradations, this checklist of the factors that can badly impact the transfer speed will come in very handy.
Let’s start with what may be network-led:
1. Network Latency
Network latency refers to the time needed to send a packet from the source to the destination. This time varies depending:
- on the physical distance
- the number of network devices which have to be crossed (also referred to as the number of hops)
- and to a lesser extent, to the performance of each of the devices
(See “Measuring network performance: links between latency, throughput and packet loss” to learn out more about what drives network latency.)
The relationship between latency and transfer rates depends on the protocol that carries the data. Maintaining our focus on the most common ones: for a UDP flow, latency may not have an impact. As for TCP applications, typically the most commonly used protocol, it will have a drastic impact.
2. Network Congestion
Network congestion refers to the saturation of a path used by packets to flow between the source and the destination. The element on the path can be either an active device (e.g., router or switch) or a physical link (e.g., cable).
When the maximum capacity of the element is reached, the packet cannot be transferred in a timely manner as it is either put in a queue (e.g., in a router) or dropped if a no queue system is available to retain them. It may even become impossible to set up new sessions.
The consequence will then vary, depending on the level of delay generated by the congestion:
- Packets are delayed for a short period of time
- The latency will increase
- Some retransmissions will occur (for TCP flows) as the acknowledgment packets are not received fast enough by the sender
- Duplicate acknowledgment packets will also be received
- Packets are lost or dropped (packet loss)
- The retransmission increases significantly: as packets are not acknowledged, they will then be massively re-sent
- Disconnections: sessions are dropped as too many packets are lost:
- You might see TTL exceeded, session time-outs
- TCP sessions not being terminated properly
3. Infrastructure Parameters (QoS, Filtering, Routing)
Although the overall network path is free of any congestion (lack of bandwidth or system resources), some devices apply policies:
- Prioritization: some traffic is either more strategic (critical applications) or more performance sensitive (real-time applications, VoIP, video conferencing) and gets allocated a higher priority than the rest of the applications using a given network path. In case the maximum capacity on the network path is reached, lower priority flows will start experiencing retransmission, packet loss or disconnection depending on how long and important the congestion is.
- Filtering/encryption: there may be many kinds of filterings in place to scan viruses, to prevent users to reach non-recommended sites, to prevent threats to web servers, etc. Filtering has an impact on data transfer: depending on how much processing time it requires. This might have an impact on the latency between the client and the server. If the processing time becomes excessive, it can generate retransmission and packet loss.
- Routing/load balancing: some devices distribute the load across a group of servers/devices or route the traffic to an adequate path from a performance and/or an economic standpoint. The devices may also be overloaded or misconfigured which could lead to retransmission, packet loss or disconnection issues.
While troubleshooting slow transfer rates, it is important to list the devices on the path between clients and servers. You can then identify at which point in time and for which flow: retransmissions, duplicate acknowledgments, packet loss, TTL expired and session time-out or incomplete TCP start can be observed.
4. Client or Server Health
It is probably the last item you will consider if you are focused on network performance. But these systems also have limited resources which can lead to a congestion situation and slow down the data transfer rates.
If a server lacks hardware resources, such as RAM, CPU, I/O, it will process user queries slower.
At a given moment, a client or a server reaching a congestion point will slow down the transfer using standard TCP mechanisms.
Here is how you can identify that situation:
- 0 window events: one of the parties is asking to reduce the throughput (see this article on how TCP events impact application performance). You can interpret this indicator as a sign of lack of resources and investigates on the host to identify which resource is not sufficiently available.
- RST – Resets events: one of the parties disconnects the session abruptly. Keep in mind that some applications may use RST as a standard way to terminate a session, even if it is not a best practice!
It is easy to gather this information from your network traffic and to quickly pinpoint where your data transfer slowdowns are coming from.