Automatic translation

Archives

February 2012
The My Me J V S D
"January
A 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29

Contributors

In search of lost time response

Recently on a mission to a major player in the French energy market, I was faced with a performance problem on one of their web portals. The symptom was as follows: when load increases, the production environment, the generation time of some pages were increasing dramatically (up to 12 seconds). Briefly, the technical components constituting the portal are:

  • a cluster of three Apache servers,
  • a cluster of three servers Weblogic Portal 8.1sp5,
  • a cluster. three servers Weblogic 8.1sp5 Integration,
  • iWay Connector for SAP 5.5.006.R2, iWay Software, installed on each instance WLI
  • SAP
  • Oracle 9i.

The production environment was devoid of a diagnostic tool and no error message could appease suspect in the various log files. First reflexes:

  • Check processor load of Solaris servers. The CPU load was very low.
  • Check access to inputs / outputs. The amount of data read and written was very low and the number of disk accesses and networks.
  • Check the activity of the crumb of different JVM from the WebLogic administration consoles. Nothing to report.

We could exclude programming errors resulting in excessive consumption of CPU and I / O, infinite loops, and memory leaks-all of our assumptions. The slowdown seemed to be related to a problem accessing a critical resource such as a JDBC pool, a pool of threads or a synchronized block. Second reflexes:

  • control the number and thread activity through the WebLogic console,
  • generate an "Thread Dump" by launching a SIGQUIT signal ("kill -3" on Unix or "Control + Break" on Windows) to processes running JVMs. As a reminder, this operation does not complete the execution of JVMs. When a JVM receives this signal, it temporarily suspends execution of all its threads, generates an execution trace ("stack trace") for each of them, then reactivate.

The ideal tool for analyzing an "Thread Dump" is ... "Thread Dump Analyzer." This is a free application available at: http://tda.dev.java.net First observation: almost half of the threads were attentent WLI servers on the same synchronized block. The point of contention was located in the method "debug" in class "LoggerDecorator" iWay's connector. To go further in our investigations, we installed the product Introscope CA | Wily Technology WLI servers on environmental qualification, then we drove traffic to reproduce the anomaly. Introscope is a tool for monitoring and diagnostic capabilities to monitor the activity of all components requested during a transaction, on multiple servers. Its low overhead in resource machine can deploy it on production environments. Second observation: 92% of production time portal pages was consumed by the trace mechanism (log) of the SAP connector! Lacking sources iWay Connector to correct it, a case was opened at BEA Systems. In conclusion,

  1. TDA has allowed us to identify the source of the problem. This small free tool must be part of your emergency kit.
  2. Instroscope allowed us to identify the methods of generating the slowdowns and measure their importance. This tool also allowed us to correlate slowdowns with simultaneous access to the trace class by superimposing the curve of response time of the offending method and the curve of the number of access method. This is a product developed to monitor complex applications in production and quickly diagnose the source of an anomaly.

. . . → Read more: In search of lost time response