linux-kernel - Re: [BUG] perf/core: Task stuck on global_ctx_data

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20260107091652.GB3707891@noisy.programming.kicks-ass.net>
Date: Wed, 7 Jan 2026 10:16:52 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Namhyung Kim <namhyung@...nel.org>
Cc: Ingo Molnar <mingo@...hat.com>,
	Arnaldo Carvalho de Melo <acme@...nel.org>,
	Mark Rutland <mark.rutland@....com>,
	Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
	Jiri Olsa <jolsa@...nel.org>, Ian Rogers <irogers@...gle.com>,
	Adrian Hunter <adrian.hunter@...el.com>,
	James Clark <james.clark@...aro.org>,
	linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [BUG] perf/core: Task stuck on global_ctx_data_rwsem

On Tue, Jan 06, 2026 at 02:34:40PM -0800, Namhyung Kim wrote:
> Hello,
> 
> On Mon, Dec 22, 2025 at 03:36:53PM -0800, Namhyung Kim wrote:
> > On Mon, Dec 22, 2025 at 03:34:23PM -0800, Namhyung Kim wrote:
> > > Hello,
> > > 
> > > I got a report that a task is stuck in perf_event_exit_task() waiting
> > > for global_ctx_data_rwsem.  On large systems, it'd have performance
> > > issues when it grabs the lock to iterate all threads in the system to
> > > allocate the context data.  And it'd block task exit path which is
> > > problematic especially under memory pressure.
> > > 
> > >   perf_event_open
> > >     perf_event_alloc
> > >       attach_perf_ctx_data
> > >         attach_global_ctx_data
> > >           percpu_down_write (global_ctx_data_rwsem)
> > >             for_each_process_thread
> > >               alloc_task_ctx_data
> > >                                                do_exit
> > >                                                  perf_event_exit_task
> > >                                                    percpu_down_read (global_ctx_data_rwsem)
> > > 
> > > I think attach_global_ctx_data() should skip tasks with PF_EXITING and
> > > it'd be nice if perf_event_exit_task() could release the ctx_data
> > > unconditionally.  But I'm not sure how to synchronize them properly.
> > > 
> > > Any thoughts?
> 
> I'm curious if this makes any sense..  I feel like it needs to check the
> flag again before allocation.
> 
> Thanks,
> Namhyung
> 
> 
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 376fb07d869b8b50..2a8847e95d7eb698 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -5469,6 +5469,8 @@ attach_global_ctx_data(struct kmem_cache *ctx_cache)
>  	/* Allocate everything */
>  	scoped_guard (rcu) {
>  		for_each_process_thread(g, p) {
> +			if (p->flags & PF_EXITING)
> +				continue;
>  			cd = rcu_dereference(p->perf_ctx_data);
>  			if (cd && !cd->global) {
>  				cd->global = 1;

I suppose this makes sense.

> @@ -14563,7 +14565,6 @@ void perf_event_exit_task(struct task_struct *task)
>  	/*
>  	 * Detach the perf_ctx_data for the system-wide event.
>  	 */
> -	guard(percpu_read)(&global_ctx_data_rwsem);
>  	detach_task_ctx_data(task);
>  }

This would need a comment; something like:

	/*
	 * This can be done without holding global_ctx_data_rwsem
	 * because this is done after setting PF_EXITING such that
	 * attach_global_ctx_data() will skip over this task.
	 */
	WARN_ON_ONCE(!(task->flags & PF_EXITING))

But yes, I suppose this can do. The question is however, how do you get
into this predicament to begin with? Are you creating and destroying a
lot of global LBR events or something?

Would it make sense to delay detach_global_ctx_data() for a second or
so? That is, what is your event creation pattern?