Blog

TCP Series #1: How to Diagnose TCP Connection Setup Issues?

How to Diagnose TCP Connection Setup Issues?

This is the first article in a series of articles covering all you need to know to troubleshoot performance issues impacting applications relying on the TCP Protocol. (See the bottom of this article for a full list) In this article, we will consider the TCP connection setup.

Let’s have a look at how TCP sessions are established… and what can go wrong!

The TCP protocol is a connection-oriented protocol, which means that a connection is established and maintained until the application programs at each end have finished exchanging messages. TCP works with the Internet Protocol (IP).

TCP provides reliable, ordered, and error-free transmission. To do so TCP has features such as Handshake, Reset, Fin, Ack, Push packets, and other types of flags to keep the connection alive and to not lose any information.

TCP is used under a number of application protocols, such as HTTP, so it is important to know how to diagnostic TCP issues. In this series of articles, we will explain TCP meta information and explain why it is important for performance troubleshooting and how to measure it easily with SkyLIGHT PVX.

How Does a Session Start? Tcp Handshake & Connection Time

A TCP connection, also called 3-way Handshake is achieved with SYNSYN+ACK and ACK packets. From this handshake, we can extract a performance metric called Connection Time (CT), which summarizes how fast session a can be set up between a client and a server over a network. For more details, see this excellent article on Wikipedia.

Figure 1 – How TCP handshake is analyzed
Figure 1 – How TCP handshake is analyzed

The three steps of the TCP handshake are:

  1. The ‘SYN’ is the first packet sent from a client to a server; it literally asks a server to open a connection with it
  2. If it’s possible, the server will respond with an ‘SYN+ACK’, means “I receive your ‘SYN’ and I’m OK”
  3. And finally, the client sends an ‘ACK’ to validate the connection

How to Diagnose TCP Connection Faults

1 – SYN Without Connections

A first case you can easily diagnostic with SkyLIGHT PVX is: “Could my clients connect to my servers?” In the PVX navigation menu, go to Application → Clients, then choose the TCP theme and set the Filter called “Only Unilateral Flow”. The pattern is that we only see traffic from the client to the server and no response from the server.

Figure 2 – Filter on Unilateral Flows Only
Figure 2 – Filter on Unilateral Flows Only

This means that you want to see top client IPs with flows from the client only and without any responses.

For Advanced Users of SkyLIGHT PVX

We set the filters to see unilateral flows, and this shows mostly ‘SYN‘ issues, however, you could also get other types of flows. To query only the ‘SYN’ without connections and only them, use a custom filter: 

Figure 3 –  SkyLIGHT PVX finds unilateral flows and sorts them.
Figure 3 –  SkyLIGHT PVX finds unilateral flows and sorts them.

As you see on the results above, there are several IPs which demand to connect to a server (SYN > 0) but they cannot connect to them (Connections = 0).

Here are common failure cases:

  • A firewall denies those connections. In this case, you could apply the same query to client zones (in the same menu) to see if the IPs are in the same zone.
  • The server does not exist anymore or is not available. This happens frequently when a server IP is changed, yet some clients continue to query the old one.

2 – Bad Connection Ratio

In a perfect world, you should have 1 ‘SYN’ per TCP connection. SkyLIGHT PVX provides a metric to see this connection efficiency, it is an ‘SYN’ per Connection rate (which corresponds to the number of SYN packets compared to the number of TCP sessions set up). This metric is available in the ‘details’ tables by using the TCP theme. You can also graph its evolution over time in Application → Custom charts.

Figure 4 –  PVX custom chart SYN/Conn
Figure 4 –  PVX custom chart SYN/Conn

A bad ‘SYN’ efficiency is sometimes a network issue. Thus the misconnections are caused by packet loss or contingency. You can check this assumption by looking at the Connection Time. If it remains low and impacts several hosts, then it’s probably a network issue.

However, if the Connection Time is high, the issue is on the server side, it is overloaded and cannot answer to all clients. Finally, if the ‘SYN’ ratio is huge, then you can have security issues, like a DDOS attack.

Advanced SkyLIGHT PVX

The network latency – RTT (Round Trip Time) – can give you another indication that the issue is on the network side. SkyLIGHT PVX provides the RTT in the Network Performancesmetric theme.

Figure 5 – Troubleshoot connections with Connection Times and SYN rates
Figure 5 – Troubleshoot connections with Connection Times and SYN rates

Conclusion

In this first article, we saw a short presentation of TCP performance metrics and how the TCP protocol handles the connections with SYN SYN+ACK ACK packets. We also see some common failure cases that can be diagnosed easily with SkyLIGHT PVX.

To troubleshoot these kind of issues we used pages Top ClientsTop Client Zones and Custom Charts. To go further, we used “Advanced Filter: Unilateral Flows” to filter flows with no responses.

We introduce several metrics: the number of ‘SYN’ and ‘Handshakes’ (connections), the SYN Efficiency and the Connection Time.

In a subsequent article, we will see how to end a connection with Reset and Fin packets.

More from this series