lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d5fcf34f-63fe-451b-89ad-621c38981709@linux.intel.com>
Date: Thu, 5 Jun 2025 09:46:32 -0400
From: "Liang, Kan" <kan.liang@...ux.intel.com>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
 Namhyung Kim <namhyung@...nel.org>, Ian Rogers <irogers@...gle.com>,
 Mark Rutland <mark.rutland@....com>, LKML <linux-kernel@...r.kernel.org>,
 "linux-perf-use." <linux-perf-users@...r.kernel.org>,
 Stephane Eranian <eranian@...gle.com>, Chun-Tse Shao <ctshao@...gle.com>,
 Thomas Richter <tmricht@...ux.ibm.com>, Leo Yan <leo.yan@....com>,
 Aishwarya TCV <aishwarya.tcv@....com>,
 Venkat Rao Bagalkote <venkat88@...ux.ibm.com>
Subject: Re: [PATCH V3] perf: Fix the throttle error of some clock events



On 2025-06-04 7:21 p.m., Alexei Starovoitov wrote:
> On Wed, Jun 4, 2025 at 10:16 AM <kan.liang@...ux.intel.com> wrote:
>>
>> From: Kan Liang <kan.liang@...ux.intel.com>
>>
>> Both ARM and IBM CI reports RCU stall, which can be reproduced by the
>> below perf command.
>>   perf record -a -e cpu-clock -- sleep 2
>>
>> The issue is introduced by the generic throttle patch set, which
>> unconditionally invoke the event_stop() when throttle is triggered.
>>
>> The cpu-clock and task-clock are two special SW events, which rely on
>> the hrtimer. The throttle is invoked in the hrtimer handler. The
>> event_stop()->hrtimer_cancel() waits for the handler to finish, which is
>> a deadlock. Instead of invoking the stop(), the HRTIMER_NORESTART should
>> be used to stop the timer.
>>
>> There may be two ways to fix it.
>> - Introduce a PMU flag to track the case. Avoid the event_stop in
>>   perf_event_throttle() if the flag is detected.
>>   It has been implemented in the
>>   https://lore.kernel.org/lkml/20250528175832.2999139-1-kan.liang@linux.intel.com/
>>   The new flag was thought to be an overkill for the issue.
>> - Add a check in the event_stop. Return immediately if the throttle is
>>   invoked in the hrtimer handler. Rely on the existing HRTIMER_NORESTART
>>   method to stop the timer.
>>
>> The latter is implemented here.
>>
>> Move event->hw.interrupts = MAX_INTERRUPTS before the stop(). It makes
>> the order the same as perf_event_unthrottle(). Except the patch, no one
>> checks the hw.interrupts in the stop(). There is no impact from the
>> order change.
>>
>> Reported-by: Leo Yan <leo.yan@....com>
>> Reported-by: Aishwarya TCV <aishwarya.tcv@....com>
>> Closes: https://lore.kernel.org/lkml/20250527161656.GJ2566836@e132581.arm.com/
>> Reported-by: Alexei Starovoitov <alexei.starovoitov@...il.com>
>> Closes: https://lore.kernel.org/lkml/djxlh5fx326gcenwrr52ry3pk4wxmugu4jccdjysza7tlc5fef@ktp4rffawgcw/
>> Reported-by: Venkat Rao Bagalkote <venkat88@...ux.ibm.com>
>> Closes: https://lore.kernel.org/lkml/8e8f51d8-af64-4d9e-934b-c0ee9f131293@linux.ibm.com/
>> Signed-off-by: Kan Liang <kan.liang@...ux.intel.com>
> 
> It seems the patch fixes one issue and introduces another ?
> 
> Looks like the throttle event is sticky.
> Once it's reached the perf_event no longer works ?

No. It should still work even the throttle is triggered.

sdp@...4e6bce080:~$ sudo bash -c 'echo 10 >
/proc/sys/kernel/perf_event_max_sample_rate'
sdp@...4e6bce080:~$ sudo perf record -a -e cpu-clock -c10000 -- sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.559 MB perf.data (584 samples) ]
sdp@...4e6bce080:~$ sudo perf record -a -e cpu-clock -c10000 -- sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.564 MB perf.data (613 samples) ]
sdp@...4e6bce080:~$ sudo perf record -a -e cpu-clock -c10000 -- sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.545 MB perf.data (502 samples) ]


> Repro:
> test_progs -t perf_branches/perf_branches_no_hw
> #250/2   perf_branches/perf_branches_no_hw:OK
> 
> test_progs -t stacktrace_build_id_nmi
> #393     stacktrace_build_id_nmi:OK
> 
> test_progs -t perf_branches/perf_branches_no_hw
> perf_branches/perf_branches_no_hw:FAIL
> 

Do you have more logs regarding where it's failed?

Thanks,
Kan> Maybe it's an unrelated bug.
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ