Posted on

Perl Source Code Profiler

Here is a nicely done perl profiler that we’ve used in the past to help us with our Perl work.  You may find this useful as well.  Rather than describe the tool, we’ll let the detailed documentation handle that for us.


 # profile code and write database to ./nytprof.out
 perl -d:NYTProf

 # convert database into a set of html files, e.g., ./nytprof/index.html

 # or into comma seperated files, e.g., ./nytprof/*.csv


Devel::NYTProf is a powerful feature-rich perl source code profiler.

  • Performs per-line statement profiling for fine detail
  • Performs per-subroutine statement profiling for overview
  • Performs per-block statement profiling (the first profiler to do so)
  • Accounts correctly for time spent after calls return
  • Performs inclusive and exclusive timing of subroutines
  • Subroutine times are per calling location (a powerful feature)
  • Can profile compile-time activity, just run-time, or just END time
  • Uses novel techniques for efficient profiling
  • Sub-microsecond (100ns) resolution on systems with clock_gettime()
  • Very fast – the fastest statement and subroutine profilers for perl
  • Handles applications that fork, with no performance cost
  • Immune from noise caused by profiling overheads and I/O
  • Program being profiled can stop/start the profiler
  • Generates richly annotated and cross-linked html reports
  • Trivial to use with mod_perl – add one line to httpd.conf
  • Includes an extensive test suite
  • Tested on very large codebases

NYTProf is effectively two profilers in one: a statement profiler, and a subroutine profiler.

Statement Profiling

The statement profiler measures the time between entering one perl statement and entering the next. Whenever execution reaches a new statement, the time since entering the previous statement is calculated and added to the time associated with the line of the source file that the previous statement starts on.

By default the statement profiler also determines the first line of the current block and the first line of the current statement, and accumulates times associated with those. NYTProf is the only Perl profiler to perform block level profiling.

Another innovation unique to NYTProf is automatic compensation for a problem inherent in simplistic statement-to-statement timing. Consider a statement that calls a subroutine and then performs some other work that doesn’t execute new statements, for example:

  foo(...) + mkdir(...);

In all other statement profilers the time spent in remainder of the expression (mkdir in the example) will be recorded as having been spent on the last statement executed in foo()! Here’s another example:

  while (<>) {

After the first time around the loop, any further time spent evaluating the condition (waiting for input in this example) would be be recorded as having been spent on the last statement executed in the loop!

NYTProf avoids these problems by intercepting the opcodes which indicate that control is returning into some previous statement and adjusting the profile accordingly.

The statement profiler naturally generates a lot of data which is streamed out to a file in a very compact format. NYTProf takes care to not include the measurement and writing overheads in the profile times (some profilers produce ‘noisy’ data due to periodic stdio flushing).

Subroutine Profiling

The subroutine profiler measures the time between entering a subroutine and leaving it. It then increments a call count and accumulates the duration. For each subroutine called, separate counts and durations are stored for each location that called the subroutine.

Subroutine entry is detected by intercepting the entersub opcode. Subroutine exit is detected via perl’s internal save stack. The result is both extremely fast and very robust.

Note that subroutines that recurse directly or indirectly, such as Error::try, will show higher subroutine inclusive times because the time spent recuring will be double-counted. That may change in future.

Application Profiling

NYTProf records extra information in the data file to capture details that may be useful when analysing the performance. It also records the filename and line ranges of all the subroutines.

NYTProf can profile applications that fork, and does so with no loss of performance. There’s (now) no special ‘allowfork’ mode. It just works. NYTProf detects the fork and starts writing a new profile file with the pid appended to the filename.

Fast Profiling

The NYTProf profiler is written almost entirely in C and great care has been taken to ensure it’s very efficient.

Apache Profiling

Just add one line near the start of your httpd.conf file:

        PerlModule Devel::NYTProf::Apache

By default you’ll get a /tmp/nytprof.$$.out file for the parent process and a /tmp/nytprof.$parent.out.$$ file for each worker process.

NYTProf takes care to detect when control is returning back from perl to mod_perl so time spent in mod_perl (such as waiting for the next request) does not get allocated to the last statement executed.

Works with mod_perl 1 and 2. See Devel::NYTProf::Apache for more information.


Usually you’d load Devel::NYTProf on the command line using the perl -d option:

 perl -d:NYTProf

To save typing the ‘:NYTProf’ you could set the PERL5DB env var

 PERL5DB='use Devel::NYTProf'

and then just perl -d would work:

 perl -d

Or you can avoid the need to add the -d option at all by using the PERL5OPT env var:


That’s also very handy when you can’t alter the perl command line being used to run the script you want to profile.


The behavior of Devel::NYTProf may be modified by setting the environment variable NYTPROF. It is possible to use this environment variable to effect multiple setting by separating the values with a :. For example:

    export NYTPROF=trace=2:start=init:file=/tmp/nytprof.out


Append the current process id to the end of the filename.

This avoids concurrent, or consecutive, processes from overwriting the same file.


Set trace level to N. 0 is off (the default). Higher values cause more detailed trace output.


Specify at which phase of program execution the profiler should be enabled:

  start=begin - start immediately (the default)
  start=init  - start at begining of INIT phase (after compilation)
  start=end   - start at begining of END phase
  start=no    - don't automatically start

The start=no option is handy if you want to explicitly control profiling by calling DB::enable_profile() and DB::disable_profile() yourself.


Set to 0 to disable the collection of subroutine inclusive timings.


Set to 0 to disable the determination of block and subroutine location per statement. This makes the profiler about 50% faster (as of July 2008) but you loose some valuable information. The extra cost is likely to be reduced in later versions anyway, as little optimization has been done on that part of the code. The profiler is fast enough that you shouldn’t need to do this.


Set to 0 to disable the extra work done to allocate times accurately when returning into the middle of statement. For example leaving a subroutine and returning into the middle of statement, or re-evaluting a loop condition.

This feature also ensures that in embedded environments, such as mod_perl, the last statement executed doesn’t accumulate the time spent ‘outside perl’.

NYTProf is the only line-level profiler to measure these times correctly. The profiler is fast enough that you shouldn’t need to disable this feature.


Set to 1 to enable use of the traditional DB::DB() subroutine to perform profiling, instead of the faster ‘opcode redirection’ technique that’s used by default. It also disables some extra mechanisms that help ensure more accurate results for things like the last statements in subroutines.

The default ‘opcode redirection’ technique can’t profile subroutines that were compiled before NYTProf was loaded. So using use_db_sub=1 can be useful in cases where you can’t load the profiler early in the life of the application. If this proves to be useful to you then please let us know, otherwise this vestige of old slower ways is likely to be removed.


Measure user CPU + system CPU time instead of the real elapsed ‘wall clock’ time (which is the default).

Measuring CPU time has the advantage of making the measurements independant of time spent blocked waiting for the cpu or network i/o etc. But it also has the severe disadvantage of having typically far less accurate timings.

Most systems use a 0.01 second granularity. With modern processors having multi- gigahertz clocks, 0.01 seconds is like a lifetime. The cpu time clock ‘ticks’ happen so rarely relative to the activity of a most applications that you’d have to run the code for many hours to have any hope of reasonably useful results.


Specify the output file to write profile data to (default: ‘./nytprof.out’).


You can profile only parts of an application by calling DB::enable_profile() and DB::disable_profile() at the appropriate moments.

Using the start=no option let’s you leave the profiler disabled until the right moment, or circumstances, are reached.


The Devel::NYTProf::Data module provides a low-level interface for loading the profile data.

The Devel::NYTProf::Reader module provides an interface for generating arbitrary reports. This means that you can implement your own output format in perl. (Though the module is in a state of flux and may be deprecated soon.)

Included in the bin directory of this distribution are two scripts which implement the Devel::NYTProf::Reader interface:

  • nytprofcsv – creates comma delimited profile reports
  • nytprofhtml – creates attractive, richly annotated, and fully cross-linked html reports (including statistics, source code and color highlighting)


Only profiles code loaded after this module

Loading via the perl -d option ensures it’s loaded first.


Devel::NYTProf is not currently thread safe. If you’d be interested in helping us make it thread safe then please get in touch with us.

For perl versions before 5.8.8 it may change what caller() returns

For example, the Readonly module croaks with an “Invalid tie” when profiled with perl versions before 5.8.8. That’s because Readonly explicitly checking for certain values from caller(). We’re not quite sure what the cause is yet.

Calls made via operator overloading

Calls made via operator overloading are not noticed by any subroutine profiler.


The goto &$sub; isn’t recognised as a subroutine call by the subroutine profiler.


Currently there’s no support for Windows. Some work is being done on a port. If you’d be interested in helping us port to Windows then please get in touch with us.

#line directives

The reporting code currently doesn’t handle #line directives, but at least it warns about them. Patches welcome.




Screenshots of nytprofhtml v2.01 reports can be seen at and A writeup of the new features of NYTProf v2 can be found at and the background story, explaining the “why”, can be found at

Mailing list and discussion at

Public SVN Repository and hacking instructions at

nytprofhtml is a script included that produces html reports. nytprofcsv is another script included that produces plain text CSV reports.

Devel::NYTProf::Reader is the module that powers the report scripts. You might want to check this out if you plan to implement a custom report (though it may be deprecated in a future release).


Adam Kaplan, <akaplan at>. Tim Bunce, and Steve Peters, <steve at>.


  Copyright (C) 2008 by Adam Kaplan and The New York Times Company.
  Copyright (C) 2008 by Tim Bunce, Ireland.

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.8 or, at your option, any later version of Perl 5 you may have available.


A bit of history and a shameless plug…

NYTProf stands for ‘New York Times Profiler’. Indeed, this module was initially developed from Devel::FastProf by The New York Times Co. to help our developers quickly identify bottlenecks in large Perl applications. The NY Times loves Perl and we hope the community will benefit from our work as much as we have from theirs.

Please visit, our open source blog to see what we are up to, to see some of our open projects and then check out for the latest news!


Subroutine-level profilers:

  Devel::DProf        | 1995-10-31 | ILYAZ
  Devel::AutoProfiler | 2002-04-07 | GSLONDON
  Devel::Profiler     | 2002-05-20 | SAMTREGAR
  Devel::Profile      | 2003-04-13 | JAW
  Devel::DProfLB      | 2006-05-11 | JAW
  Devel::WxProf       | 2008-04-14 | MKUTTER

Statement-level profilers:

  Devel::SmallProf    | 1997-07-30 | ASHTED
  Devel::FastProf     | 2005-09-20 | SALVA
  Devel::NYTProf      | 2008-03-04 | AKAPLAN
  Devel::Profit       | 2008-05-19 | LBROCARD

Devel::NYTProf is a (now distant) fork of Devel::FastProf, which was itself an evolution of Devel::SmallProf.

Adam Kaplan took Devel::FastProf and added html report generation (based on Devel::Cover) and a test suite – a tricky thing to do for a profiler. Meanwhile Tim Bunce had been extending Devel::FastProf to add novel per-sub and per-block timing, plus subroutine caller tracking.

When Devel::NYTProf was released Tim switched to working on Devel::NYTProf because the html report would be a good way to show the extra profile data, and the test suite made development much easier and safer.

Then he went a little crazy and added a slew of new features, in addition to per-sub and per-block timing and subroutine caller tracking. These included the ‘opcode interception’ method of profiling, ultra-fast and robust inclusive subroutine timing, doubling performance, plus major changes to html reporting to display all the extra profile call and timing data in richly annotated and cross-linked reports.

Steve Peters came on board along the way with patches for portability and to keep NYTProf working with the latest development perl versions.

Adam’s work is sponsored by The New York Times Co. Tim’s work was partly sponsored by Shopzilla.