lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aEE7ug56bPS_ZJUQ@gmail.com>
Date: Thu, 5 Jun 2025 08:39:54 +0200
From: Ingo Molnar <mingo@...nel.org>
To: kan.liang@...ux.intel.com
Cc: peterz@...radead.org, mingo@...hat.com, namhyung@...nel.org,
	irogers@...gle.com, mark.rutland@....com,
	linux-kernel@...r.kernel.org, linux-perf-users@...r.kernel.org,
	eranian@...gle.com, ctshao@...gle.com, tmricht@...ux.ibm.com,
	Leo Yan <leo.yan@....com>, Aishwarya TCV <aishwarya.tcv@....com>,
	Alexei Starovoitov <alexei.starovoitov@...il.com>,
	Venkat Rao Bagalkote <venkat88@...ux.ibm.com>
Subject: Re: [PATCH V3] perf: Fix the throttle error of some clock events


* kan.liang@...ux.intel.com <kan.liang@...ux.intel.com> wrote:

> From: Kan Liang <kan.liang@...ux.intel.com>
> 
> Both ARM and IBM CI reports RCU stall, which can be reproduced by the
> below perf command.
>   perf record -a -e cpu-clock -- sleep 2
> 
> The issue is introduced by the generic throttle patch set, which
> unconditionally invoke the event_stop() when throttle is triggered.
> 
> The cpu-clock and task-clock are two special SW events, which rely on
> the hrtimer. The throttle is invoked in the hrtimer handler. The
> event_stop()->hrtimer_cancel() waits for the handler to finish, which is
> a deadlock. Instead of invoking the stop(), the HRTIMER_NORESTART should
> be used to stop the timer.
> 
> There may be two ways to fix it.
> - Introduce a PMU flag to track the case. Avoid the event_stop in
>   perf_event_throttle() if the flag is detected.
>   It has been implemented in the
>   https://lore.kernel.org/lkml/20250528175832.2999139-1-kan.liang@linux.intel.com/
>   The new flag was thought to be an overkill for the issue.
> - Add a check in the event_stop. Return immediately if the throttle is
>   invoked in the hrtimer handler. Rely on the existing HRTIMER_NORESTART
>   method to stop the timer.
> 
> The latter is implemented here.
> 
> Move event->hw.interrupts = MAX_INTERRUPTS before the stop(). It makes
> the order the same as perf_event_unthrottle(). Except the patch, no one
> checks the hw.interrupts in the stop(). There is no impact from the
> order change.
> 
> Reported-by: Leo Yan <leo.yan@....com>
> Reported-by: Aishwarya TCV <aishwarya.tcv@....com>
> Closes: https://lore.kernel.org/lkml/20250527161656.GJ2566836@e132581.arm.com/
> Reported-by: Alexei Starovoitov <alexei.starovoitov@...il.com>
> Closes: https://lore.kernel.org/lkml/djxlh5fx326gcenwrr52ry3pk4wxmugu4jccdjysza7tlc5fef@ktp4rffawgcw/
> Reported-by: Venkat Rao Bagalkote <venkat88@...ux.ibm.com>
> Closes: https://lore.kernel.org/lkml/8e8f51d8-af64-4d9e-934b-c0ee9f131293@linux.ibm.com/
> Signed-off-by: Kan Liang <kan.liang@...ux.intel.com>

Any idea which commit introduced this bug?

Was it:

  9734e25fbf5a perf: Fix the throttle logic for a group
 
plus the followup driver updates?

Thanks,

	Ingo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ