[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1f97aa2e-c488-4800-ac7b-e7351f2a30ea@linux.intel.com>
Date: Thu, 5 Jun 2025 09:51:24 -0400
From: "Liang, Kan" <kan.liang@...ux.intel.com>
To: Ingo Molnar <mingo@...nel.org>
Cc: peterz@...radead.org, mingo@...hat.com, namhyung@...nel.org,
irogers@...gle.com, mark.rutland@....com, linux-kernel@...r.kernel.org,
linux-perf-users@...r.kernel.org, eranian@...gle.com, ctshao@...gle.com,
tmricht@...ux.ibm.com, Leo Yan <leo.yan@....com>,
Aishwarya TCV <aishwarya.tcv@....com>,
Alexei Starovoitov <alexei.starovoitov@...il.com>,
Venkat Rao Bagalkote <venkat88@...ux.ibm.com>
Subject: Re: [PATCH V3] perf: Fix the throttle error of some clock events
On 2025-06-05 2:39 a.m., Ingo Molnar wrote:
>
> * kan.liang@...ux.intel.com <kan.liang@...ux.intel.com> wrote:
>
>> From: Kan Liang <kan.liang@...ux.intel.com>
>>
>> Both ARM and IBM CI reports RCU stall, which can be reproduced by the
>> below perf command.
>> perf record -a -e cpu-clock -- sleep 2
>>
>> The issue is introduced by the generic throttle patch set, which
>> unconditionally invoke the event_stop() when throttle is triggered.
>>
>> The cpu-clock and task-clock are two special SW events, which rely on
>> the hrtimer. The throttle is invoked in the hrtimer handler. The
>> event_stop()->hrtimer_cancel() waits for the handler to finish, which is
>> a deadlock. Instead of invoking the stop(), the HRTIMER_NORESTART should
>> be used to stop the timer.
>>
>> There may be two ways to fix it.
>> - Introduce a PMU flag to track the case. Avoid the event_stop in
>> perf_event_throttle() if the flag is detected.
>> It has been implemented in the
>> https://lore.kernel.org/lkml/20250528175832.2999139-1-kan.liang@linux.intel.com/
>> The new flag was thought to be an overkill for the issue.
>> - Add a check in the event_stop. Return immediately if the throttle is
>> invoked in the hrtimer handler. Rely on the existing HRTIMER_NORESTART
>> method to stop the timer.
>>
>> The latter is implemented here.
>>
>> Move event->hw.interrupts = MAX_INTERRUPTS before the stop(). It makes
>> the order the same as perf_event_unthrottle(). Except the patch, no one
>> checks the hw.interrupts in the stop(). There is no impact from the
>> order change.
>>
>> Reported-by: Leo Yan <leo.yan@....com>
>> Reported-by: Aishwarya TCV <aishwarya.tcv@....com>
>> Closes: https://lore.kernel.org/lkml/20250527161656.GJ2566836@e132581.arm.com/
>> Reported-by: Alexei Starovoitov <alexei.starovoitov@...il.com>
>> Closes: https://lore.kernel.org/lkml/djxlh5fx326gcenwrr52ry3pk4wxmugu4jccdjysza7tlc5fef@ktp4rffawgcw/
>> Reported-by: Venkat Rao Bagalkote <venkat88@...ux.ibm.com>
>> Closes: https://lore.kernel.org/lkml/8e8f51d8-af64-4d9e-934b-c0ee9f131293@linux.ibm.com/
>> Signed-off-by: Kan Liang <kan.liang@...ux.intel.com>
>
> Any idea which commit introduced this bug?
>
> Was it:
>
> 9734e25fbf5a perf: Fix the throttle logic for a group
>
Yes.
Since it is still in the tip.git, I'm not sure if the commit ID is valid
for the Fixes tag. so I didn't mention the commit ID in the log.
Thanks,
Kan
> plus the followup driver updates?
> > Thanks,
>
> Ingo
Powered by blists - more mailing lists