[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <30c62dee-2219-4b39-94c7-b9cc81130c9e@linux.intel.com>
Date: Mon, 13 Oct 2025 16:38:15 +0800
From: "Mi, Dapeng" <dapeng1.mi@...ux.intel.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Octavia Togami <octavia.togami@...il.com>, stable@...r.kernel.org,
regressions@...ts.linux.dev, linux-kernel@...r.kernel.org,
linux-perf-users@...r.kernel.org
Subject: Re: [REGRESSION] bisected: perf: hang when using async-profiler
caused by perf: Fix the POLL_HUP delivery breakage
On 10/13/2025 4:05 PM, Peter Zijlstra wrote:
> On Mon, Oct 13, 2025 at 10:34:27AM +0800, Mi, Dapeng wrote:
>
>> It looks the issue described in the link
>> (https://lore.kernel.org/all/20250606192546.915765-1-kan.liang@linux.intel.com/T/#u)
>> happens again but in a different way. :(
>>
>> As the commit message above link described, cpu-clock (and task-clock) is
>> a specific SW event which rely on hrtimer. The hrtimer handler calls
>> __perf_event_overflow() and then event_stop (cpu_clock_event_stop()) and
>> eventually call hrtimer_cancel() which traps into a dead loop which waits
>> for the calling hrtimer handler finishes.
>>
>> As the
>> change (https://lore.kernel.org/all/20250606192546.915765-1-kan.liang@linux.intel.com/T/#u),
>> it should be enough to just disable the event and don't need an extra event
>> stop.
>>
>> @Octavia, could you please check if the change below can fix this issue?
>> Thanks.
>>
>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>> index 7541f6f85fcb..883b0e1fa5d3 100644
>> --- a/kernel/events/core.c
>> +++ b/kernel/events/core.c
>> @@ -10343,7 +10343,20 @@ static int __perf_event_overflow(struct perf_event
>> *event,
>> ret = 1;
>> event->pending_kill = POLL_HUP;
>> perf_event_disable_inatomic(event);
>> - event->pmu->stop(event, 0);
>> +
>> + /*
>> + * The cpu-clock and task-clock are two special SW events,
>> + * which rely on the hrtimer. The __perf_event_overflow()
>> + * is invoked from the hrtimer handler for these 2 events.
>> + * Avoid to call event_stop()->hrtimer_cancel() for these
>> + * 2 events since hrtimer_cancel() waits for the hrtimer
>> + * handler to finish, which would trigger a deadlock.
>> + * Only disabling the events is enough to stop the hrtimer.
>> + * See perf_swevent_cancel_hrtimer().
>> + */
>> + if (event->attr.config != PERF_COUNT_SW_CPU_CLOCK &&
>> + event->attr.config != PERF_COUNT_SW_TASK_CLOCK)
>> + event->pmu->stop(event, 0);
> This is broken though; you cannot test config without first knowing
> which PMU you're dealing with.
Ah, yes. Just ignore this.
>
> Also, that timer really should get stopped, we can't know for certain
> this overflow is of the timer itself or not, it could be a related
> event.
>
> Something like the below might do -- but please carefully consider the
> cases where hrtimer_try_to_cancel() might fail; in those cases we'll
> have set HES_STOPPED and the hrtimer callback *SHOULD* observe this and
> NORESTART.
>
> But I didn't check all the details.
The only reason that hrtimer_try_to_cancel() could fail is that the hrtimer
callback is currently executing, so current change should be fine.
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 820127536e62..a91481d57841 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -11756,7 +11756,8 @@ static enum hrtimer_restart perf_swevent_hrtimer(struct hrtimer *hrtimer)
>
> event = container_of(hrtimer, struct perf_event, hw.hrtimer);
>
> - if (event->state != PERF_EVENT_STATE_ACTIVE)
> + if (event->state != PERF_EVENT_STATE_ACTIVE ||
> + event->hw.state & PERF_HES_STOPPED)
> return HRTIMER_NORESTART;
>
> event->pmu->read(event);
> @@ -11810,7 +11811,7 @@ static void perf_swevent_cancel_hrtimer(struct perf_event *event)
> ktime_t remaining = hrtimer_get_remaining(&hwc->hrtimer);
> local64_set(&hwc->period_left, ktime_to_ns(remaining));
>
> - hrtimer_cancel(&hwc->hrtimer);
> + hrtimer_try_to_cancel(&hwc->hrtimer);
> }
> }
>
> @@ -11854,12 +11855,14 @@ static void cpu_clock_event_update(struct perf_event *event)
>
> static void cpu_clock_event_start(struct perf_event *event, int flags)
> {
> + event->hw.state = 0;
> local64_set(&event->hw.prev_count, local_clock());
> perf_swevent_start_hrtimer(event);
> }
>
> static void cpu_clock_event_stop(struct perf_event *event, int flags)
> {
> + event->hw.state = PERF_HES_STOPPED;
> perf_swevent_cancel_hrtimer(event);
> if (flags & PERF_EF_UPDATE)
> cpu_clock_event_update(event);
Besides cpu-clock, task-clock should need similar change as well. I would
post a complete change later.
Powered by blists - more mailing lists