How to Use ARM Performance Monitoring Units(PMU) in ARMv7
This guide will cover how to use the CPU performance counters in ARMv7 through inline assembly. There are 4 programmable counters which can each track one of the architectural events, with the option of having a 5th counter enabled which may only track cycles. Firstly, if you need to be doing this from user-space code, you must enable that as the instructions and registers used are priveledged and can only be accessed by the kernel by default. To do this, you must run the following assembly from kernel space, either via a modified kernel or a kernel module.
The MCR and MRC instructions will be used to write to the PMU. The PMU is a coprocessor and these instructions stand for Move to Coprocessor from Register and Move to Register from Coprocessor. Note that these instructions work like reading and writing to registers but are a little slower.
For a list of events you can track, you must look in the arm manual. Not all arm processors support all of the events in ARMv7, even if they run ARMv7.
Note: the counters stop working after a processor goes to sleep and wakes back up. You can run this alongside the code in the idle scheduler (see cpu_startup_entry function) as the idle scheduler is in charge of waking a processor up.
Before declaring which events you wish to track, you must set up the options with the PMU. The options will go into a variable that is passed to the coprocessor as shown in the example below. The lowest 4 bits of 0x8000000f are saying which of the 4 programmable counters to enable/clear overflows for.
Telling the system to track an event is a two step process. First, you must say which of your 4 enabled programmable counters you wish to use.
Once you have selected this counter for programming, you may write into it which event you would like it to track.
Reading one of these events is likewise a two step process of selecting a register to read and then doing the actual read.
Good luck!