linux-kernel - Re: [REGRESSION] bisected: perf: hang when using async-profiler caused by perf: Fix the POLL

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <30c62dee-2219-4b39-94c7-b9cc81130c9e@linux.intel.com>
Date: Mon, 13 Oct 2025 16:38:15 +0800
From: "Mi, Dapeng" <dapeng1.mi@...ux.intel.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Octavia Togami <octavia.togami@...il.com>, stable@...r.kernel.org,
 regressions@...ts.linux.dev, linux-kernel@...r.kernel.org,
 linux-perf-users@...r.kernel.org
Subject: Re: [REGRESSION] bisected: perf: hang when using async-profiler
 caused by perf: Fix the POLL_HUP delivery breakage


On 10/13/2025 4:05 PM, Peter Zijlstra wrote:
> On Mon, Oct 13, 2025 at 10:34:27AM +0800, Mi, Dapeng wrote:
>
>> It looks the issue described in the link
>> (https://lore.kernel.org/all/20250606192546.915765-1-kan.liang@linux.intel.com/T/#u)
>> happens again but in a different way. :(
>>
>> As the commit message above link described,  cpu-clock (and task-clock) is
>> a specific SW event which rely on hrtimer. The hrtimer handler calls
>> __perf_event_overflow() and then event_stop (cpu_clock_event_stop()) and
>> eventually call hrtimer_cancel() which traps into a dead loop which waits
>> for the calling hrtimer handler finishes.
>>
>> As the
>> change (https://lore.kernel.org/all/20250606192546.915765-1-kan.liang@linux.intel.com/T/#u),
>> it should be enough to just disable the event and don't need an extra event
>> stop.
>>
>> @Octavia, could you please check if the change below can fix this issue?
>> Thanks.
>>
>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>> index 7541f6f85fcb..883b0e1fa5d3 100644
>> --- a/kernel/events/core.c
>> +++ b/kernel/events/core.c
>> @@ -10343,7 +10343,20 @@ static int __perf_event_overflow(struct perf_event
>> *event,
>>                 ret = 1;
>>                 event->pending_kill = POLL_HUP;
>>                 perf_event_disable_inatomic(event);
>> -               event->pmu->stop(event, 0);
>> +
>> +               /*
>> +                * The cpu-clock and task-clock are two special SW events,
>> +                * which rely on the hrtimer. The __perf_event_overflow()
>> +                * is invoked from the hrtimer handler for these 2 events.
>> +                * Avoid to call event_stop()->hrtimer_cancel() for these
>> +                * 2 events since hrtimer_cancel() waits for the hrtimer
>> +                * handler to finish, which would trigger a deadlock.
>> +                * Only disabling the events is enough to stop the hrtimer.
>> +                * See perf_swevent_cancel_hrtimer().
>> +                */
>> +               if (event->attr.config != PERF_COUNT_SW_CPU_CLOCK &&
>> +                   event->attr.config != PERF_COUNT_SW_TASK_CLOCK)
>> +                       event->pmu->stop(event, 0);
> This is broken though; you cannot test config without first knowing
> which PMU you're dealing with.

Ah, yes. Just ignore this.


>
> Also, that timer really should get stopped, we can't know for certain
> this overflow is of the timer itself or not, it could be a related
> event.
>
> Something like the below might do -- but please carefully consider the
> cases where hrtimer_try_to_cancel() might fail; in those cases we'll
> have set HES_STOPPED and the hrtimer callback *SHOULD* observe this and
> NORESTART.
>
> But I didn't check all the details.

The only reason that hrtimer_try_to_cancel() could fail is that the hrtimer
callback is currently executing, so current change should be fine. 


>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 820127536e62..a91481d57841 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -11756,7 +11756,8 @@ static enum hrtimer_restart perf_swevent_hrtimer(struct hrtimer *hrtimer)
>  
>  	event = container_of(hrtimer, struct perf_event, hw.hrtimer);
>  
> -	if (event->state != PERF_EVENT_STATE_ACTIVE)
> +	if (event->state != PERF_EVENT_STATE_ACTIVE ||
> +	    event->hw.state & PERF_HES_STOPPED)
>  		return HRTIMER_NORESTART;
>  
>  	event->pmu->read(event);
> @@ -11810,7 +11811,7 @@ static void perf_swevent_cancel_hrtimer(struct perf_event *event)
>  		ktime_t remaining = hrtimer_get_remaining(&hwc->hrtimer);
>  		local64_set(&hwc->period_left, ktime_to_ns(remaining));
>  
> -		hrtimer_cancel(&hwc->hrtimer);
> +		hrtimer_try_to_cancel(&hwc->hrtimer);
>  	}
>  }
>  
> @@ -11854,12 +11855,14 @@ static void cpu_clock_event_update(struct perf_event *event)
>  
>  static void cpu_clock_event_start(struct perf_event *event, int flags)
>  {
> +	event->hw.state = 0;
>  	local64_set(&event->hw.prev_count, local_clock());
>  	perf_swevent_start_hrtimer(event);
>  }
>  
>  static void cpu_clock_event_stop(struct perf_event *event, int flags)
>  {
> +	event->hw.state = PERF_HES_STOPPED;
>  	perf_swevent_cancel_hrtimer(event);
>  	if (flags & PERF_EF_UPDATE)
>  		cpu_clock_event_update(event);

Besides cpu-clock, task-clock should need similar change as well. I would
post a complete change later.