Comparison and Selection of CPU Profiling Tool(s)

This document provides information on three profiling tools that were investigated namely, Gprof, Intel VTune and Oprofile. It gives a short description of the features that these profiling tools offer and reasons for their suitability to profiling helix DNA Client.

This study was only conducted on these three profiling tools because of their wide industry/academic acceptance [1] [2] [3] [4] and their results are known to be accurate [4] [5] [6]


Gprof

Gprof is the standard UNIX profiling tool [7]. Gprof requires all the code to be compiled and linked with –pg option. By doing this the compiler instruments the code by inserting timing probes at every function. These function entries increment the call counter for (caller, called) tuple. The running code generates and dumps all the timing information in a special file called gmon.out which can be then analyzed with gprof.


Example:

gcc -pg -o hello_world hello_world.c

./hello_world

gprof hello_world


Drawbacks:


1)      Needs a complete recompilation

2)      Results are not always easy to analyze for large software

3)      Since helix client is a multi threaded application it would require a statically linked binary which can be a challenging step.


Therefore Gprof is not an ideal profiling tool for Helix DNA Client


Oprofile

Oprofile does not require modification or instrumentation of the source code.

The tool itself is composed of two primary components: a kernel module and a user-level daemon [4]. The kernel module is loaded into kernel-space at system startup and creates a pseudo driver (/dev/oprofile) that can be used to configure Oprofile and to retrieve results from it by use of commands. In contrast, the daemon runs in the background within user-space and provides an easy-to-use interface through which Oprofile can be configured to monitor program performance.


OProfile Commands:

1) opcontrol  : It is used to start profiling, end a profiling session, dump profile data, and set up the profiling parameters;

2) opreport : It produces symbol or binary image summaries.


Steps:

1) Start the OProfile daemon (opcontrol -s)

2) Run the code

3) Flush Measurement, Stop daemon (opcontrol –d/-h)

4) Use tools to analyze the profiling data
5) opreport: Breakdown of CPU time by procedures


Drawbacks:

It is only for Linux platforms. Though oprofile is very powerful and is a defacto standard for profiling on Linux, its capabilities are features are also present in Intel VTune which is also available for Linux besides Windows and is equally powerful. VTune also provides several profiling options and views as compared to oprofile and can be easily integrated with IDEs like Visual Studio and Eclipse.


Intel VTune Performance Analyzer


There are three main features of VTune:

1) Counter Monitor:

This is useful in system-level tuning. The Counter Monitor tracks a default set of OS counters over a period of time. Counter monitor views can locate system-level performance bottlenecks for example disk usage and I/O usage.

2) Sampling:

This is useful in application-level tuning. It uses time-based sampling which collects samples at regular time based intervals and is 1 ms by default. By doing so it enables determination of hot sopts. VTune allows drilling down to the code level.

3)      Call Graph Analysis:

It is useful in application-level tuning. It displays the calling patterns of each function in the program. It helps in understanding the program’s architecture and in figuring out where its routines are being called from.


Advantages [5]:

1)      Low overhead:

The VTune Performance Analyzer's sampling profiler has less than 5% overhead. This allows one to get accurate performance information.

2)      System-wide analysis: The VTune Performance Analyzer not only provides Application-level tuning but also provides System-level tuning which can determine whether the application is processor intensive or I/O intensive.

3)      Sampling Does Not Require Instrumentation:

It uses hardware counters to collect profiling information. It uses a set of CPU registers that can count events like instructions executed or cache misses.


Thus, VTune is ideal for profiling Helix DNA Client.


Summary of Comparison


The table below summarizes the comparison of tools


Tools

Manual

Instrumentation

Sampling

Hardware Counter

System timers

X


Gprof


X

X

Oprofile

X


Intel VTune


X

X



References:

[1]  KernelNewbies & OProfile

http://www.linuxtoday.com/developer/2001110400520NWKN


[2] Profiling and Optimization http://wiki.secondlife.com/wiki/Profiling_and_Optimization


[3] Helix DNA Server Performance Profiling

https://helix-server.helixcommunity.org/2005/devdocs/serverprofiling.html


[4] An Overview of Software Performance Analysis Tools and Techniques: From GProf to DTrace

http://www.cs.wustl.edu/~jain/cse567-06/ftp/sw_monitors1/index.html


[5] Advantages Of VTune™ Performance Analyzer Over Other Profilers

www.intel.com/cd/software/products/asmo-na/eng/vtune/219271.htm


[6] Implementation of profiling

http://www.cs.utah.edu/dept/old/texinfo/as/gprof.html


[7] gprof: a Call Graph Execution Profiler

http://docs.freebsd.org/44doc/psd/18.gprof/paper.pdf