The use of CPU ops here avoids the offset calculations that we used to have to do with per cpu operations. The result of this patch is that event counters are coded with a single instruction the following way: incq %gs:offset(%rip) Without these patches this was: mov %gs:0x8,%rdx mov %eax,0x38(%rsp) mov xxx(%rip),%eax mov %eax,0x48(%rsp) mov varoffset,%rax incq 0x110(%rax,%rdx,1) Signed-off-by: Christoph Lameter