This is the fourth article in a series of articles (see a full list at bottom of page) covering TCP core concepts to effectively troubleshoot performance problems impacting applications. This article discusses the concept of the TCP receive window.
After considering how the TCP retransmission mechanism works, we will now examine TCP receive windows and how they can impact performance.
Why should you care about TCP windowing? Because it drives the speed of data transfers and hence the experience of your users accessing the applications, as described in these two other articles:
- What’s the impact of TCP events on application performance?
- The 5 factors which slow down data transfers and how to identify them
What is a TCP Receive Window?
Simply put, it is a TCP receive buffer for incoming data that has not been processed yet by the application.
The size of the TCP Receive Window is communicated to the connection partner using the window size value field of the TCP header. This field tells the link partner how much data can be sent on the wire before an acknowledgment is received. If the receiver is not able to process the data as fast as it arrives, gradually the receive buffer will fill and the TCP window will be reduced in the acknowledgment packets. This will alert the sender that it needs to reduce the amount of data sent or allow the receiver time to clear the buffer.
In the above diagram, the client and server are advertising their window size values as they communicate. Each TCP header will display the most recent window value, which can grow or shrink as the connection progresses. In this example, the client has a TCP receive window of 65,535 bytes, and the server has 5,840. For many applications, since clients tend to receive data rather than send it, clients often have a larger allocated window size. After the handshake, the client sends an HTTP GET request to the server, which is quickly processed. Two response packets from the server arrive at the client, which sends an acknowledgment along with an updated window size. The client was able to process the data packets out of the TCP buffer as fast as they came in, so the window size was not reduced. The client still has a full window available for receiving data – 65,535 bytes.
In another example, a client is requesting data from a server and begins to receive the data. However, in this case, the client is not able to quickly process the incoming data. The TCP buffer begins to fill, as indicated by the reduced window value.
The acknowledgements from the client indicate that the window is shrinking. As long as the window value does not fall to zero, this behavior will largely go unnoticed by the end user. Although the number is slightly reduced, there is still plenty of room in the buffer for data transfer to continue. In many cases, the client can catch up and will process the data out of the buffer, clearing the window out and increasing the window value.
TCP Window Scale
The TCP header value allocated for the window size is two bytes long. This means that the highest possible numeric value for a receive window is 65,535 bytes. In today’s networks, this window size is not enough to provide optimal traffic flow, especially on long, fat networks (links that have high bandwidth and high latency). In its native state, TCP cannot take advantage of these high-performance links since it can only send a maximum of 65,535 bytes at a time.
For this reason, TCP Options were introduced in RFC 1323 that enable the TCP receive window to be increased exponentially. The specific function is called TCP Window Scaling, which is advertised in the handshake process. When advertising its window, a client or server will also advertise the scale factor (multiplier) that will be used for the life of the connection.
In the image above, the sender of this packet is advertising a TCP Window of 63,792 bytes and is using a scaling factor of four. This means that that the true window size is 63,792 x 4 (255,168 bytes). Using scaling windows allows endpoints to advertise a window size of over 1GB. To use window scaling, both sides of the connection must advertise this capability in the handshake process. If one side or the other cannot support scaling, then neither will use this function. The scale factor, or multiplier, will only be sent in the SYN packets during the handshake and will be used for the life of the connection. This is one reason why it is so important to capture the handshake process when performing TCP analysis.
What Is a Zero Window?
When a client (or server – but it is usually the client) advertises a zero value for its window size, this indicates that the TCP receive buffer is full and it cannot receive any more data. It may have a stuck processor or be busy with some other task, which can cause the TCP receive buffer to fill. Zero Windows can also be caused by a problem within the application, where the TCP buffer is not being retrieved.
A TCP Zero Window from a client will halt the data transmission from the server side, allowing time for the problem station to clear its buffer. When the client begins to digest the data, it will let the server know to resume the data flow by sending a TCP Window Update packet. This will advertise an increased window size and the flow will resume.
How Can We Detect TCP Zero Window?
Window problems are usually observed on applications that move a lot of data such as backups, file transfers, and large downloads. If a performance problem is hampering data transfer, look for window problems on the receiver.
SkyLIGHT PVX can monitor for Zero Window conditions and displays statistics about which connections suffered them and when. If these problems are observed in SkyLIGHT PVX, focus on the station that is advertising the Zero Window value. Remember that this indicates the TCP receive buffer has been exhausted and data flow will stop until the buffer is cleared. These are usually caused by stuck processes on the client, under-resourced PCs or an application that is not tuned to receive high rates of data.
As an example, if we consider an application where we can observe numerous 0-Windows events generated by the 223 clients.
You can easily drill down to the clients involved in the phenomenon and confirm the impact on the data transfers and End User Response Times:
You could also view the evolution through time to understand if it is a continuous or intermittent issue:
With this level of TCP detail, SkyLIGHT PVX can quickly help to get to the root cause of a stuck TCP connection. Get your free trial today!