By Boris Rogier November 16, 2016

How to troubleshoot performance in highly distributed environments?

One of our customers recently requested assistance in handling a slowdown pertaining to an application that was accessed through different paths according to the user’s location: with a thick client, through Citrix XenApp, or through a Web interface.

troubleshooting performance distributed environments

The organization’s users, spread over tens of sites across 5 continents, were experiencing performance problems irrespective of the access mode they used.

The client already owned a couple of SkyLIGHT™ PVX physical appliances. And the first option they considered was to position one of these appliances in the data center hosting the application, located in Europe. This would have been simple, however, the data center is outsourced and deploying a physical appliance on site was unfortunately not an option!

So, the network team was being blamed, despite several obvious facts, such as:

Only this specific application was experiencing a performance degradation
It was a persistent problem
Several user sites were impacted

This situation was becoming a true nightmare for the IT Operations team, as it had been going on for several months by the time we started to work on it.

Troubleshooting on a global scale

How could they find the root cause of this degradation without installing any equipment in the datacenter, nor at any of their numerous remote sites? When we jumped in, these were just some of the difficulties that our team had to overcome as we were worked toward a solution. The client had 2 physical appliances available and their first questions were fairly technical, such as:

How can I analyze all of my sites with just two probes? I cannot ship these 1U appliances to each site, in case of degradation. It is too slow, too complicated. For some of the sites, because of customs reasons, the delivery process can take up to one month.
How can I ensure that the physical appliances I already have are used and get a good ROI?
How can I analyze both:
- The remote site’s local LAN traffic?
- The end user experience on all of the sites (from the datacenter’s traffic)?

Finding a solution

The solution needed to meet a complex set of requirements:

Silos impede visbility
- As in every large organization, different teams will have well-defined scopes of responsibility. Although they do cooperate in practice, their primary focus is on the core of their perimeter.
- Let’s take an example: for the analysis of WAN networks, the organization relies on a WAN prioritization solution that provides an overview of the network performance, but that solution is complex to use, and does not provide any application performance visibility (i.e., up to TCP only). It is, therefore, barely enough to get the network team out of trouble, but not enough so as to find the root cause of the degradation. (To find out more about the differences between traditional NPM solutions and WireData Stream Analysis solutions, you might want to read our paper on 6 reasons to change your approach to network analysis.)
- It was necessary to find a solution that could provide visibility beyond the limits of the silos!
Troubleshooting skills available at remote sites are limited
- The client’s objective was to allow local teams to take a look at problems, therefore, the solution had to be simple enough for them to make it theirs.
Degradations are random and intermittent
- Performance issues are intermittent and impossible to reproduce: they needed to capture performance data 24×7 and be able to go back in time as required.
Data center is outsourced
- There was no opportunity to deploy a physical probe
Visibility is required across 3 application types
- The client required visibility across all 3 types of application access: HTTP/HTTPS, Citrix XenApp, Thick Client
Analysis needed to be holistic
- The analysis needs to provide visibility on the remote LAN, the WAN traffic, and the application tiers in the data center.

Complex problems, require a simple solution!

The bottom line for this complex set of challenges is that we were able to efficiently respond through a simple solution deployed in just a few days…

And this is the configuration we used:

Since it was impossible to deploy any physical device in the data center, we simply deployed 3 virtual capture devices (standard virtual appliances) in charge of capturing key application flows on the different clusters within the datacenter itself.
10 remote sites reporting the performance degradations were equipped with additional virtual capture devices which were installed on existing VMware hosts.
All of the analytics were centralized within an existing physical SkyLIGHT PVX appliance in order to provide a single pane of glass.

As all of the analytics are computed in real time by each local capture device, the centralization of the data remains bandwidth savvy.

troubleshooting performance distributed environments with SkyLIGHT PVX

The deployment to 10 remote sites required only 2 days of work, and the entire integration was performed remotely requiring no travel and no shipping.
The set-up of the capture(i.e., 3 probes) within the datacenter itself required less than a day.

In the end, it took the client’s IT team less than 3 days to be able to monitor the performance of 10 sites spread over 5 continents and to monitor the different tiers in the application chain.

They were therefore able to get to the root cause in a very short period of time. In the end, the degradation was coming from the tier database. The IT Operations team was then able to diagnose the application issue and quickly fix it:

Users were inputting their data entries, but at given times the database servers were not available and the application was not providing any feedback or error to the user.
Production sites in India, the United States, and Brazil saved many days of rework and reduced their loss of production.
The local IT teams have access and actively use SkyLIGHT PVX for their troubleshooting and monitoring.
Based on this data, they achieved rapid resolution from their application vendor by precisely pointing out the flaws that needed to be fixed.

The power of agentless analysis provides a 360° visibility on application delivery

Even if you do not have a direct control on your entire IT environment, agentless performance management solutions can leverage physical and virtual network traffic, thus providing you with in-depth and extensive insight on the existence and origin of slowdowns at all levels, including:

Network
Application tiers
- Thin client architecture
- Front servers (Web or other)
- Middleware
- Back-end (database and files)
On any remote site
In any datacenter

Accedian is now part of Cisco |

How to troubleshoot performance in highly distributed environments?

Troubleshooting on a global scale

Finding a solution

Complex problems, require a simple solution!

The power of agentless analysis provides a 360° visibility on application delivery