linux-kernel - Re: [RESEND PATCH 2/2] perf/x86: improve the event scheduling to avoid unnecessary pmu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e50c5307-8b2d-9338-0acc-9375b76c211d@linux.alibaba.com>
Date:   Sun, 17 Apr 2022 23:06:44 +0800
From:   Wen Yang <wenyang@...ux.alibaba.com>
To:     Peter Zijlstra <peterz@...radead.org>,
        Stephane Eranian <eranian@...gle.com>
Cc:     Wen Yang <simon.wy@...baba-inc.com>,
        Ingo Molnar <mingo@...hat.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        mark rutland <mark.rutland@....com>,
        jiri olsa <jolsa@...hat.com>,
        namhyung kim <namhyung@...nel.org>,
        borislav petkov <bp@...en8.de>, x86@...nel.org,
        "h. peter anvin" <hpa@...or.com>, linux-perf-users@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [RESEND PATCH 2/2] perf/x86: improve the event scheduling to
 avoid unnecessary pmu_stop/start



在 2022/3/18 上午1:54, Wen Yang 写道:
> 
> 
> 在 2022/3/14 下午6:55, Peter Zijlstra 写道:
>> On Thu, Mar 10, 2022 at 11:50:33AM +0800, Wen Yang wrote:
>>
>>> As you pointed out, some non-compliant rdpmc can cause problems. But you
>>> also know that linux is the foundation of cloud servers, and many
>>> third-party programs run on it (we don't have any code for it), and 
>>> we can
>>> only observe that the monitoring data will jitter abnormally (the
>>> probability of this issue is not high, about dozens of tens of 
>>> thousands of
>>> machines).
>>
>> This might be a novel insight, but I *really* don't give a crap about
>> any of that. If they're not using it right, they get to keep the pieces.
>>
>> I'd almost make it reschedule more to force them to fix their stuff.
>>
> 
> 
> Thank you for your guidance.
> 
> We also found a case in thousands of servers where the PMU counter is no 
> longer updated due to frequent x86_pmu_stop/x86_pmu_start.
> 
> We added logs in the kernel and found that a third-party program would 
> cause the PMU counter to start/stop several times in just a few seconds, 
> as follows:
> 
> 
> [8993460.537776] XXX x86_pmu_stop line=1388 [cpu1] active_mask=100000001 
> event=ffff880a53411000, state=1, attr.type=0, attr.config=0x0, 
> attr.pinned=1, hw.idx=3, hw.prev_count=0x802a877ef302, 
> hw.period_left=0x7fd578810cfe, event.count=0x14db802a877ecab4, 
> event.prev_count=0x14db802a877ecab4
> [8993460.915873] XXX x86_pmu_start line=1312 [cpu1] 
> active_mask=200000008 event=ffff880a53411000, state=1, attr.type=0, 
> attr.config=0x0, attr.pinned=1, hw.idx=3, 
> hw.prev_count=0xffff802a9cf6a166, hw.period_left=0x7fd563095e9a, 
> event.count=0x14db802a9cf67918, event.prev_count=0x14db802a9cf67918
> [8993461.104643] XXX x86_pmu_stop line=1388 [cpu1] active_mask=100000001 
> event=ffff880a53411000, state=1, attr.type=0, attr.config=0x0, 
> attr.pinned=1, hw.idx=3, hw.prev_count=0xffff802a9cf6a166, 
> hw.period_left=0x7fd563095e9a, event.count=0x14db802a9cf67918, 
> event.prev_count=0x14db802a9cf67918
> [8993461.442508] XXX x86_pmu_start line=1312 [cpu1] 
> active_mask=200000004 event=ffff880a53411000, state=1, attr.type=0, 
> attr.config=0x0, attr.pinned=1, hw.idx=2, 
> hw.prev_count=0xffff802a9cf8492e, hw.period_left=0x7fd56307b6d2, 
> event.count=0x14db802a9cf820e0, event.prev_count=0x14db802a9cf820e0
> [8993461.736927] XXX x86_pmu_stop line=1388 [cpu1] active_mask=100000001 
> event=ffff880a53411000, state=1, attr.type=0, attr.config=0x0, 
> attr.pinned=1, hw.idx=2, hw.prev_count=0xffff802a9cf8492e, 
> hw.period_left=0x7fd56307b6d2, event.count=0x14db802a9cf820e0, 
> event.prev_count=0x14db802a9cf820e0
> [8993461.983135] XXX x86_pmu_start line=1312 [cpu1] 
> active_mask=200000004 event=ffff880a53411000, state=1, attr.type=0, 
> attr.config=0x0, attr.pinned=1, hw.idx=2, 
> hw.prev_count=0xffff802a9cfc29ed, hw.period_left=0x7fd56303d613, 
> event.count=0x14db802a9cfc019f, event.prev_count=0x14db802a9cfc019f
> [8993462.274599] XXX x86_pmu_stop line=1388 [cpu1] active_mask=100000001 
> event=ffff880a53411000, state=1, attr.type=0, attr.config=0x0, 
> attr.pinned=1, hw.idx=2, hw.prev_count=0x802a9d24040e, 
> hw.period_left=0x7fd562dbfbf2, event.count=0x14db802a9d23dbc0, 
> event.prev_count=0x14db802a9d23dbc0
> [8993462.519488] XXX x86_pmu_start line=1312 [cpu1] 
> active_mask=200000004 event=ffff880a53411000, state=1, attr.type=0, 
> attr.config=0x0, attr.pinned=1, hw.idx=2, 
> hw.prev_count=0xffff802ab0bb4719, hw.period_left=0x7fd54f44b8e7, 
> event.count=0x14db802ab0bb1ecb, event.prev_count=0x14db802ab0bb1ecb
> [8993462.726929] XXX x86_pmu_stop line=1388 [cpu1] active_mask=100000003 
> event=ffff880a53411000, state=1, attr.type=0, attr.config=0x0, 
> attr.pinned=1, hw.idx=2, hw.prev_count=0xffff802ab0bb4719, 
> hw.period_left=0x7fd54f44b8e7, event.count=0x14db802ab0bb1ecb, 
> event.prev_count=0x14db802ab0bb1ecb
> [8993463.035674] XXX x86_pmu_start line=1312 [cpu1] 
> active_mask=200000008 event=ffff880a53411000, state=1, attr.type=0, 
> attr.config=0x0, attr.pinned=1, hw.idx=3, 
> hw.prev_count=0xffff802ab0bcd328, hw.period_left=0x7fd54f432cd8, 
> event.count=0x14db802ab0bcaada, event.prev_count=0x14db802ab0bcaada
> 
> 
> Then, the PMU counter will not be updated：
> 
> [8993463.333622] x86_perf_event_update, event=ffff880a53411000, 
> new_raw_count=802abea31354
> [8993463.359905] x86_perf_event_update [cpu1] active_mask=30000000f 
> event=ffff880a53411000, state=1, attr.config=0x0, attr.pinned=1, 
> hw.idx=3, hw.prev_count=0x802abea31354, hw.period_left=0x7fd5415cecac, 
> event.count=0x14db802abea2eb06,
> [8993463.504783] x86_perf_event_update, event=ffff880a53411000, 
> new_raw_count=802ad8760160
> [8993463.521138] x86_perf_event_update [cpu1] active_mask=30000000f 
> event=ffff880a53411000, state=1, attr.config=0x0, attr.pinned=1, 
> hw.idx=3, hw.prev_count=0x802ad8760160, hw.period_left=0x7fd52789fea0, 
> event.count=0x14db802ad875d912,
> [8993463.638337] x86_perf_event_update, event=ffff880a53411000, 
> new_raw_count=802aecb4747b
> [8993463.654441] x86_perf_event_update [cpu1] active_mask=30000000f 
> event=ffff880a53411000, state=1, attr.config=0x0, attr.pinned=1, 
> hw.idx=3, hw.prev_count=0x802aecb4747b, hw.period_left=0x7fd5134b8b85, 
> event.count=0x14db802aecb44c2d,
> [8993463.837321] x86_perf_event_update, event=ffff880a53411000, 
> new_raw_count=802aecb4747b
> [8993463.861625] x86_perf_event_update [cpu1] active_mask=30000000f 
> event=ffff880a53411000, state=1, attr.config=0x0, attr.pinned=1, 
> hw.idx=3, hw.prev_count=0x802aecb4747b, hw.period_left=0x7fd5134b8b85, 
> event.count=0x14db802aecb44c2d,
> [8993464.012398] x86_perf_event_update, event=ffff880a53411000, 
> new_raw_count=802aecb4747b
> [8993464.012402] x86_perf_event_update [cpu1] active_mask=30000000f 
> event=ffff880a53411000, state=1, attr.config=0x0, attr.pinned=1, 
> hw.idx=3, hw.prev_count=0x802aecb4747b, hw.period_left=0x7fd5134b8b85, 
> event.count=0x14db802aecb44c2d,
> [8993464.013676] x86_perf_event_update, event=ffff880a53411000, 
> new_raw_count=802aecb4747b
> [8993464.013678] x86_perf_event_update [cpu1] active_mask=30000000f 
> event=ffff880a53411000, state=1, attr.config=0x0, attr.pinned=1, 
> hw.idx=3, hw.prev_count=0x802aecb4747b, hw.period_left=0x7fd5134b8b85, 
> event.count=0x14db802aecb44c2d,
> [8993464.016123] x86_perf_event_update, event=ffff880a53411000, 
> new_raw_count=802aecb4747b
> [8993464.016125] x86_perf_event_update [cpu1] active_mask=30000000f 
> event=ffff880a53411000, state=1, attr.config=0x0, attr.pinned=1, 
> hw.idx=3, hw.prev_count=0x802aecb4747b, hw.period_left=0x7fd5134b8b85, 
> event.count=0x14db802aecb44c2d,
> [8993464.016196] x86_perf_event_update, event=ffff880a53411000, 
> new_raw_count=802aecb4747b
> [8993464.016199] x86_perf_event_update [cpu1] active_mask=30000000f 
> event=ffff880a53411000, state=1, attr.config=0x0, attr.pinned=1, 
> hw.idx=3, hw.prev_count=0x802aecb4747b, hw.period_left=0x7fd5134b8b85, 
> event.count=0x14db802aecb44c2d,
> 
> ......
> 
> 
> Until 6 seconds later, the counter is stopped/started again：
> 
> 
> [8993470.243959] XXX x86_pmu_stop line=1388 [cpu1] active_mask=100000001 
> event=ffff880a53411000, state=1, attr.type=0, attr.config=0x0, 
> attr.pinned=1, hw.idx=3, hw.prev_count=0x802aecb4747b, 
> hw.period_left=0x7fd5134b8b85, event.count=0x14db802aecb44c2d, 
> event.prev_count=0x14db802aecb44c2d
> [8993470.243998] XXX x86_pmu_start line=1305 [cpu1] 
> active_mask=200000000 event=ffff880a53411000, state=1, attr.type=0, 
> attr.config=0x0, attr.pinned=1, hw.idx=3, 
> hw.prev_count=0xffff802aecb4747b, hw.period_left=0x7fd5134b8b85, 
> event.count=0x14db802aecb44c2d, event.prev_count=0x14db802aecb44c2d
> 
> [8993470.245285] x86_perf_event_update, event=ffff880a53411000, 
> new_raw_count=802aece1e6f6
> 
> ...
> 
> Such problems can be solved by avoiding unnecessary x86_pmu_{stop|start}.
> 
> Please have a look again. Thanks.
> 

We recently tracked this issue again found that it may be related to the 
behavior of the third GP of  the Intel(R) Xeon(R) Platinum 8163 CPU:


[54511836.022997] CPU#1: ctrl:       000000070000000f
[54511836.022997] CPU#1: status:     0000000000000000
[54511836.022998] CPU#1: overflow:   0000000000000000
[54511836.022998] CPU#1: fixed:      00000000000000bb
[54511836.022998] CPU#1: pebs:       0000000000000000
[54511836.022999] CPU#1: debugctl:   0000000000000000
[54511836.022999] CPU#1: active:     000000030000000f
[54511836.023000] CPU#1:   gen-PMC0 ctrl:  000000000053412e
[54511836.023000] CPU#1:   gen-PMC0 count: 0000985b7d1a15e7
[54511836.023000] CPU#1:   gen-PMC0 left:  000067a483643939
[54511836.023001] CPU#1:   gen-PMC1 ctrl:  00000000005310d1
[54511836.023002] CPU#1:   gen-PMC1 count: 000080000016448e
[54511836.023002] CPU#1:   gen-PMC1 left:  00007ffffffffd37
[54511836.023003] CPU#1:   gen-PMC2 ctrl:  00000000005301d1
[54511836.023003] CPU#1:   gen-PMC2 count: 00008000e615b9ab
[54511836.023004] CPU#1:   gen-PMC2 left:  00007fffffffffff
[54511836.023005] CPU#1:   gen-PMC3 ctrl:  000000000053003c
[54511836.023005] CPU#1:   gen-PMC3 count: 0000801f6139b1e1
[54511836.023005] CPU#1:   gen-PMC3 left:  00007fe2a2dc14b7
[54511836.023006] CPU#1: fixed-PMC0 count: 00008e0fa307b34e
[54511836.023006] CPU#1: fixed-PMC1 count: 0000ffff3d01adb8
[54511836.023007] CPU#1: fixed-PMC2 count: 0000cf10d01b651e


The Gen-pmc3 Ctrl will be changed suddenly:

[54511836.023085] CPU#1: ctrl:       000000070000000f
[54511836.023085] CPU#1: status:     0000000000000000
[54511836.023085] CPU#1: overflow:   0000000000000000
[54511836.023086] CPU#1: fixed:      00000000000000bb
[54511836.023086] CPU#1: pebs:       0000000000000000
[54511836.023086] CPU#1: debugctl:   0000000000000000
[54511836.023087] CPU#1: active:     000000030000000f
[54511836.023087] CPU#1:   gen-PMC0 ctrl:  000000000053412e
[54511836.023088] CPU#1:   gen-PMC0 count: 0000985b7d1a183b
[54511836.023088] CPU#1:   gen-PMC0 left:  000067a483643939
[54511836.023089] CPU#1:   gen-PMC1 ctrl:  00000000005310d1
[54511836.023089] CPU#1:   gen-PMC1 count: 0000800000164ca8
[54511836.023090] CPU#1:   gen-PMC1 left:  00007ffffffffd37
[54511836.023091] CPU#1:   gen-PMC2 ctrl:  00000000005301d1
[54511836.023091] CPU#1:   gen-PMC2 count: 00008000e61634fd
[54511836.023092] CPU#1:   gen-PMC2 left:  00007fffffffffff
[54511836.023092] CPU#1:   gen-PMC3 ctrl:  000000010043003c
[54511836.023093] CPU#1:   gen-PMC3 count: 0000801f613b87d0
[54511836.023093] CPU#1:   gen-PMC3 left:  00007fe2a2dc14b7
[54511836.023094] CPU#1: fixed-PMC0 count: 00008e0fa309e091
[54511836.023095] CPU#1: fixed-PMC1 count: 0000ffff3d050901
[54511836.023095] CPU#1: fixed-PMC2 count: 0000cf10d01b651e


The gen-PMC3 ctrl changed,
000000000053003c -> 000000010043003c

After that, the gen-PMC3 count remains 0000801f613b87d0 and will not be 
updated. A series of subsequent issues, such as abnormal CPI data, are 
generated.

However, the special value (000000010043003c) of the gen-pmc3 Ctrl is 
not actively set by the application. It is suspected that some special 
operation has caused the GP3 Ctrl to be changed, and it is still under 
discussion with Intel’s FAE.
We are also looking at the following code:
https://github.com/torvalds/linux/blob/94710cac0ef4ee177a63b5227664b38c95bbf703/arch/x86/events/intel/core.c#L1931 


At present, only the above phenomenon has been observed, but the exact 
cause has not yet been found.

However, this patch attempts to avoid the switching of the pmu counters 
in various perf_events, so the special behavior of a single pmu counter 
will not be propagated to other events.

--
Best wishes,
Wen