[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJ3xEMgKfgbpxzxx595bG=bRM-ETm4vJfWALR3p-wVzzcHxHSw@mail.gmail.com>
Date: Sun, 18 Oct 2020 20:42:28 +0300
From: Or Gerlitz <gerlitz.or@...il.com>
To: Andi Kleen <andi@...stfloor.org>
Cc: Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>,
Brendan Gregg <bgregg@...flix.com>,
Linux Netdev List <netdev@...r.kernel.org>
Subject: Re: perf measure for stalled cycles per instruction on newer Intel processors
On Thu, Oct 15, 2020 at 9:33 PM Andi Kleen <andi@...stfloor.org> wrote:
> On Thu, Oct 15, 2020 at 05:53:40PM +0300, Or Gerlitz wrote:
> > Earlier Intel processors (e.g E5-2650) support the more of classical
> > two stall events (for backend and frontend [1]) and then perf shows
> > the nice measure of stalled cycles per instruction - e.g here where we
> > have IPC of 0.91 and CSPI (see [2]) of 0.68:
>
> Don't use it. It's misleading on a out-of-order CPU because you don't
> know if it's actually limiting anything.
>
> If you want useful bottleneck data use --topdown.
So running again, this time with the below params, I got this output
where all the right most column is colored red. I wonder what can be
said on the amount/ratio of stalls for this app - if you can maybe recommend
some posts of yours to better understand that, I saw some comment in the
perf-stat man page and some lwn article but wasn't really able to figure it out.
FWIW, the kernel is 5.5.7-100.fc30.x86_64 and the CPU E5-2650 0
$ perf stat --topdown -a taskset -c 0 $APP
[...]
Performance counter stats for 'system wide':
retiring bad speculation
frontend bound backend bound
S0-D0-C0 1 24.9% 1.1%
16.1% 57.9%
S0-D0-C1 1 16.3% 1.3%
17.3% 65.1%
S0-D0-C2 1 17.0% 1.2%
15.3% 66.5%
S0-D0-C3 1 18.3% 0.8%
8.2% 72.8%
S0-D0-C4 1 18.1% 0.8%
8.5% 72.6%
S0-D0-C5 1 17.6% 0.8%
10.0% 71.6%
S0-D0-C6 1 18.3% 0.7%
7.4% 73.6%
S0-D0-C7 1 15.4% 1.4%
22.1% 61.2%
S1-D0-C0 1 15.9% 1.4%
16.4% 66.3%
S1-D0-C1 1 21.9% 2.6%
16.9% 58.5%
S1-D0-C2 1 20.8% 3.7%
17.1% 58.4%
S1-D0-C3 1 17.8% 1.0%
9.2% 72.1%
S1-D0-C4 1 17.8% 1.0%
9.0% 72.2%
S1-D0-C5 1 17.8% 1.0%
9.0% 72.2%
S1-D0-C6 1 17.4% 1.4%
12.8% 68.4%
S1-D0-C7 1 23.6% 4.3%
17.2% 55.0%
13.341823591 seconds time elapsed
while running with perf stat -d gives this:
$ perf stat -d taskset -c 0 $APP
Performance counter stats for 'taskset -c 0 ./main.gcc9.3.1':
15,075.30 msec task-clock # 0.900 CPUs
utilized
199 context-switches # 0.013 K/sec
1 cpu-migrations # 0.000 K/sec
117,987 page-faults # 0.008 M/sec
40,907,365,540 cycles # 2.714 GHz
26,431,604,986 stalled-cycles-frontend # 64.61% frontend
cycles idle
21,734,615,045 stalled-cycles-backend # 53.13% backend
cycles idle
35,339,765,469 instructions # 0.86 insn per
cycle
# 0.75 stalled
cycles per insn
Powered by blists - more mailing lists