Last edited one month ago

Perf

Applicable for STM32MP13x lines, STM32MP15x lines, STM32MP21x lines, STM32MP23x lines, STM32MP25x lines

1. Article purpose[edit | edit source]

This article provides basic information needed to start using the Linux® kernel tool: perf[1].

2. Introduction[edit | edit source]

The following table provides a brief description of the tool, as well as its availability depending on the software packages:

Yes: this tool is either present (ready to use or to be activated), or can be integrated and activated on the software package.

No: this tool is not present and cannot be integrated, or it is present but cannot be activated on the software package.

Tool STM32MPU Embedded Software distribution STM32MPU Embedded Software distribution for Android™
Warning white.png Warning
No Developer Package is presently delivered.
The tool (even if marked as present) has not been yet tested with the current release.
Name Category Purpose Starter Package Developer Package Distribution Package Starter Package Developer Package Distribution Package
perf Monitoring tools perf[1] is a Linux user space tool, which allows getting system performance figures Yes Yes Yes No* No No*
Note: simpleperf[2] is present as equivalent but with less options

3. Installing the trace and debug tool on your target board[edit | edit source]

3.1. Using the STM32MPU Embedded Software distribution[edit | edit source]

perf is installed by default and ready to be used in all the STM32MPU Embedded Software Packages.

which perf
/usr/bin/perf

It is integrated in the weston image distribution through openembedded-core package: openembedded-core/meta/recipes-core/packagegroups/packagegroup-core-tools-profile.bb.

RRECOMMENDS_${PN} = "\
   ${PERF} \
   trace-cmd \
   blktrace \
   ${PROFILE_TOOLS_X} \
   ${PROFILE_TOOLS_SYSTEMD} \
   "
...
PERF = "perf"

3.2. Using the STM32MPU Embedded Software distribution for Android™[edit | edit source]

simpleperf[2] is equivalent to perf, and is installed by default (/system/xbin/simpleperf) and is ready to be used with all STM32MPU software packages for Android™.

which simpleperf
/system/bin/simpleperf

It supports less options:

simpleperf --help
Usage: simpleperf [common options] subcommand [args_for_subcommand]
common options:
    -h/--help     Print this help information.
    --log <severity> Set the minimum severity of logging. Possible severities
                       include verbose, debug, warning, info, error, fatal.
                       Default is info.
    --version     Print version of simpleperf.
subcommands:
    debug-unwind        Debug/test offline unwinding.
    dump                dump perf record file
    help                print help information for simpleperf
    kmem                collect kernel memory allocation information
    list                list available event types
    record              record sampling info in perf.data
    report              report sampling information in perf.data
    report-sample       report raw sample information in perf.data
    stat                gather performance counter information

4. Getting started[edit | edit source]

4.1. Perf commands[edit | edit source]

usage: perf [--version] [--help] [OPTIONS] COMMAND [ARGS]

The most commonly used perf commands are:
  annotate        Reads perf.data (created by perf record) and displays annotated code
  archive         Creates archive with object files with build-ids found in perf.data file
  bench           General framework for benchmark suites
  buildid-cache   Manages build-id cache.
  buildid-list    Lists the buildids in a perf.data file
  c2c             Shared Data C2C/HITM Analyzer.
  config          Gets and sets variables in a configuration file.
  data            Data file related processing
  diff            Reads perf.data files and displays the differential profile
  evlist          Lists the event names in a perf.data file
  ftrace          simple wrapper for kernel's ftrace functionality
  inject          Filters to augment the events stream with additional information
  kallsyms        Searches running kernel for symbols
  kmem            Tool to trace/measure kernel memory properties
  kvm             Tool to trace/measure kvm guest os
  list            Lists all symbolic event types
  lock            Analyzes lock events
  mem             Profiles memory accesses
  record          Runs a command and records its profile into perf.data
  report          Reads perf.data (created by perf record) and displays the profile
  sched           Tool to trace/measure scheduler properties (latencies)
  script          Reads perf.data (created by perf record) and displays trace output
  stat            Runs a command and gathers performance counter statistics
  test            Runs sanity tests.
  timechart       Tool to visualize total system behavior during a workload
  top             System profiling tool.
  probe           Defines new dynamic tracepoints

See 'perf COMMAND -h' for more information on a specific command.

4.2. Most useful commands with simple to use interface[edit | edit source]

  • perf top (Linux kernel documentation[3]): provides the CPU load by counting the number of cycles events; the default order is descending the number of samples per symbol:
perf top
 40.62%  [kernel]                           [k] v7_dma_inv_range
 18.65%  [kernel]                           [k] _raw_spin_unlock_irqrestore
 17.01%  [kernel]                           [k] arch_cpu_idle
  8.27%  [kernel]                           [k] v7_dma_clean_range
  5.00%  [kernel]                           [k] rcu_idle_exit
  1.70%  [kernel]                           [k] cpu_startup_entry
  0.52%  [kernel]                           [k] trace_graph_return
  0.48%  [kernel]                           [k] finish_task_switch
  0.48%  libc-2.18.so                       [.] memcpy
  0.47%  [kernel]                           [k] trace_graph_entry
Means that CPU is spending 40% of time in function v7_dma_inv_range, and 18.65% in _raw_spin_unlock_irqrestore.
More information and examples are available in perf.wiki.kernel.org[4]
This is also possible to display the result in a specified sorting:
Usage: perf top [<options>]
   -s, --sort <key[,key2...]>
              sort by key(s): pid, comm, dso, symbol, parent, cpu, srcline, ... Please refer to the main page for the complete list.


  • perf stat (Linux kernel documentation[5]): obtains event counts
perf stat hello_world_example
User space example: hello world from STMicroelectronics
10 9 8 7 6 5 4 3 2 1 0 
User space example: goodbye from STMicroelectronics

Performance counter stats for 'hello_world_example':

         4.328249      task-clock (msec)         #    0.000 CPUs utilized          
               11      context-switches          #    0.003 M/sec                  
                0      cpu-migrations            #    0.000 K/sec                  
               38      page-faults               #    0.009 M/sec                  
          2710036      cycles                    #    0.626 GHz                    
           640856      instructions              #    0.24  insn per cycle         
            75644      branches                  #   17.477 M/sec                  
            21764      branch-misses             #   28.77% of all branches        

     11.109859338 seconds time elapsed
More information and examples are available in perf.wiki.kernel.org[6].


  • perf list (Linux kernel documentation[7]): supported symbolic event types
perf list
 branch-instructions OR branches                    [Hardware event]
 branch-misses                                      [Hardware event]
 bus-cycles                                         [Hardware event]
 cache-misses                                       [Hardware event]
 cache-references                                   [Hardware event]
 cpu-cycles OR cycles                               [Hardware event]
 instructions                                       [Hardware event]
 alignment-faults                                   [Software event]
 bpf-output                                         [Software event]
 context-switches OR cs                             [Software event]
 cpu-clock                                          [Software event]
 cpu-migrations OR migrations                       [Software event]
 dummy                                              [Software event]
 emulation-faults                                   [Software event]
 major-faults                                       [Software event]
 minor-faults                                       [Software event]
 page-faults OR faults                              [Software event]
 task-clock                                         [Software event]
 L1-dcache-load-misses                              [Hardware cache event]
 L1-dcache-loads                                    [Hardware cache event]
 L1-dcache-store-misses                             [Hardware cache event]
 L1-dcache-stores                                   [Hardware cache event]
 L1-icache-load-misses                              [Hardware cache event]
 L1-icache-loads                                    [Hardware cache event]
 LLC-load-misses                                    [Hardware cache event]
 LLC-loads                                          [Hardware cache event]
 LLC-store-misses                                   [Hardware cache event]
 LLC-stores                                         [Hardware cache event]
 branch-load-misses                                 [Hardware cache event]
 branch-loads                                       [Hardware cache event]
 dTLB-load-misses                                   [Hardware cache event]
 dTLB-store-misses                                  [Hardware cache event]
 iTLB-load-misses                                   [Hardware cache event]
 armv7_cortex_a7/br_immed_retired/                  [Kernel PMU event]
 armv7_cortex_a7/br_mis_pred/                       [Kernel PMU event]
 armv7_cortex_a7/br_pred/                           [Kernel PMU event]
 armv7_cortex_a7/br_return_retired/                 [Kernel PMU event]
 armv7_cortex_a7/bus_access/                        [Kernel PMU event]
 armv7_cortex_a7/bus_cycles/                        [Kernel PMU event]
 armv7_cortex_a7/cid_write_retired/                 [Kernel PMU event]
 armv7_cortex_a7/cpu_cycles/                        [Kernel PMU event]
 armv7_cortex_a7/exc_return/                        [Kernel PMU event]
 armv7_cortex_a7/exc_taken/                         [Kernel PMU event]
 armv7_cortex_a7/inst_retired/                      [Kernel PMU event]
 armv7_cortex_a7/inst_spec/                         [Kernel PMU event]
 armv7_cortex_a7/l1d_cache/                         [Kernel PMU event]
 armv7_cortex_a7/l1d_cache_refill/                  [Kernel PMU event]
 armv7_cortex_a7/l1d_cache_wb/                      [Kernel PMU event]
 armv7_cortex_a7/l1d_tlb_refill/                    [Kernel PMU event]
 armv7_cortex_a7/l1i_cache/                         [Kernel PMU event]
 armv7_cortex_a7/l1i_cache_refill/                  [Kernel PMU event]
 armv7_cortex_a7/l1i_tlb_refill/                    [Kernel PMU event]
 armv7_cortex_a7/l2d_cache/                         [Kernel PMU event]
 armv7_cortex_a7/l2d_cache_refill/                  [Kernel PMU event]
 armv7_cortex_a7/l2d_cache_wb/                      [Kernel PMU event]
 armv7_cortex_a7/ld_retired/                        [Kernel PMU event]
 armv7_cortex_a7/mem_access/                        [Kernel PMU event]
 armv7_cortex_a7/memory_error/                      [Kernel PMU event]
 armv7_cortex_a7/pc_write_retired/                  [Kernel PMU event]
 armv7_cortex_a7/st_retired/                        [Kernel PMU event]
 armv7_cortex_a7/sw_incr/                           [Kernel PMU event]
 armv7_cortex_a7/ttbr_write_retired/                [Kernel PMU event]
 armv7_cortex_a7/unaligned_ldst_retired/            [Kernel PMU event]
 rNNN                                               [Raw hardware event descriptor]
 cpu/t1=v1[,t2=v2,t3 ...]/modifier                  [Raw hardware event descriptor]
 mem:<addr>[/len][:access]                          [Hardware breakpoint]
 alarmtimer:alarmtimer_cancel                       [Tracepoint event]
 alarmtimer:alarmtimer_fired                        [Tracepoint event]
 alarmtimer:alarmtimer_start                        [Tracepoint event]
 alarmtimer:alarmtimer_suspend                      [Tracepoint event]
 asoc:snd_soc_bias_level_done                       [Tracepoint event]
 asoc:snd_soc_bias_level_start                      [Tracepoint event]
 asoc:snd_soc_dapm_connected                        [Tracepoint event]
 asoc:snd_soc_dapm_done                             [Tracepoint event]
 asoc:snd_soc_dapm_path                             [Tracepoint event]
 asoc:snd_soc_dapm_start                            [Tracepoint event]
 asoc:snd_soc_dapm_walk_done                        [Tracepoint event]
 asoc:snd_soc_dapm_widget_event_done                [Tracepoint event]
 asoc:snd_soc_dapm_widget_event_start               [Tracepoint event]
...
 xhci-hcd:xhci_inc_enq                              [Tracepoint event]
 xhci-hcd:xhci_queue_trb                            [Tracepoint event]
 xhci-hcd:xhci_ring_alloc                           [Tracepoint event]
 xhci-hcd:xhci_ring_expansion                       [Tracepoint event]
 xhci-hcd:xhci_ring_free                            [Tracepoint event]
 xhci-hcd:xhci_setup_addressable_virt_device        [Tracepoint event]
 xhci-hcd:xhci_setup_device                         [Tracepoint event]
 xhci-hcd:xhci_setup_device_slot                    [Tracepoint event]
 xhci-hcd:xhci_stop_device                          [Tracepoint event]
 xhci-hcd:xhci_urb_dequeue                          [Tracepoint event]
 xhci-hcd:xhci_urb_enqueue                          [Tracepoint event]
 xhci-hcd:xhci_urb_giveback                         [Tracepoint event]


  • perf record (Linux kernel documentation[8]): records events for later reporting
perf record hello_world_example 

User space example: hello world from STMicroelectronics
10 9 8 7 6 5 4 3 2 1 0 
User space example: goodbye from STMicroelectronics
[ perf record: Woken up 1 time to write data ]
[ perf record: Captured and wrote 0.004 MB perf.data (28 samples) ]
This is possible to filter events (given by perf list command). More information, options and examples are available in perf.wiki.kernel.org[9].
By default, the events are recorded in the perf.data file. If you want to specify another output file name you have to add -o, --output <file> option.


  • perf report (Linux kernel documentation[10]): breaks down the events by process, function, etc.
Example after previous command "perf record hello_world_example"
perf report
Samples: 28  of event 'cycles:ppp', Event count (approx.):2737925                
Overhead  Command          Shared Object Symbol                   
  12.66%  hello_world_exa  ld-2.26.so         [.] _dl_relocate_object
  11.71%  hello_world_exa  [kernel.kallsyms]  [k] filemap_map_pages
  10.65%  hello_world_exa  [kernel.kallsyms]  [k] n_tty_write
   6.43%  hello_world_exa  [kernel.kallsyms]  [k] percpu_counter_add_batch
   6.43%  hello_world_exa  ld-2.26.so         [.] sbrk
   6.24%  hello_world_exa  [kernel.kallsyms]  [k] cpu_v7_set_pte_ext
   5.56%  hello_world_exa  [kernel.kallsyms]  [k] alloc_set_pte
   5.56%  hello_world_exa  libc-2.26.so       [.] __sbrk
   5.37%  hello_world_exa  [kernel.kallsyms]  [k] __vma_link_file
   5.32%  hello_world_exa  [kernel.kallsyms]  [k] __fput
   5.32%  hello_world_exa  [kernel.kallsyms]  [k] ldsem_up_read
   5.32%  hello_world_exa  [kernel.kallsyms]  [k] unmap_page_range
   5.32%  hello_world_exa  libc-2.26.so       [.] printf
   5.24%  hello_world_exa  [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore
   2.23%  hello_world_exa  [kernel.kallsyms]  [k] perf_event_mmap
   0.48%  hello_world_exa  [kernel.kallsyms]  [k] perf_output_begin
   0.13%  perf             [kernel.kallsyms]  [k] perf_event_exec
By default, report file perf.data is read as input file. If you want to specify another input file name you have to add -i, --input <file> option.
More information and examples are available in perf.wiki.kernel.org[11].


  • perf bench (Linux kernel documentation[12]): runs different kernel microbenchmarks:
# List of all available benchmark collections:

        sched: Scheduler and IPC benchmarks
          mem: Memory access benchmarks
        futex: Futex stressing benchmarks
          all: All benchmarks
Example of getting memcpy benchmark for 100MB:
perf bench mem memcpy --size 100MB
# Running 'mem/memcpy' benchmark:
# function 'default' (Default memcpy() provided by glibc)
# Copying 100MB bytes ...

       1.426138 GB/sec
More information and examples are available in perf.wiki.kernel.org[13].

5. To go further[edit | edit source]

5.1. Visualizing trace using Flame Graphs[edit | edit source]

As part of Flame Graphs[14], this is possible to visualize trace coming from perf. Flame graph.png

The Flame graphs are generated using Flame graphs tool suite[15].

  • Install the Flame Graph tool suite on host PC side
cd <your_local_path>
git clone https://github.com/brendangregg/FlameGraph.git
cd FlameGraph
  • Generate a Flame graph from perf tool
When generating perf record, -g option must be added.

As example for a top command:

- Perform perf record command on board side
perf record -a -g top
perf script > perf_top.out

- Copy perf_top.out on your host PC (i.e. in the FlameGraph directory)

- Perform the flame graph generation on host PC side using stackcollapse-perf.pl script
./stackcollapse-perf.pl perf_top.out > out.top_folded

- Use flamegraph.pl to render a SVG (Scalable Vector Graphics) file.
./flamegraph.pl out.top_folded > top.svg

- Visualize SVG using web browser for example
firefox top.svg

6. References[edit | edit source]


  • Useful external links
Document link Document Type Description
perf tutorial User Guide perf.wiki.kernel.org
perf (wikipedia.org) Standard wikipedia.org
Brendan Gregg's perf page Perf example From Brendan Gregg
Eclipse perf plugin page Eclipse perf plugin Eclipse.org