lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260107223256.GA807925@noisy.programming.kicks-ass.net>
Date: Wed, 7 Jan 2026 23:32:56 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Namhyung Kim <namhyung@...nel.org>
Cc: Ingo Molnar <mingo@...hat.com>,
	Arnaldo Carvalho de Melo <acme@...nel.org>,
	Mark Rutland <mark.rutland@....com>,
	Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
	Jiri Olsa <jolsa@...nel.org>, Ian Rogers <irogers@...gle.com>,
	Adrian Hunter <adrian.hunter@...el.com>,
	James Clark <james.clark@...aro.org>,
	linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [BUG] perf/core: Task stuck on global_ctx_data_rwsem

On Wed, Jan 07, 2026 at 11:28:24PM +0100, Peter Zijlstra wrote:
> On Wed, Jan 07, 2026 at 11:01:53AM -0800, Namhyung Kim wrote:
> 
> > > But yes, I suppose this can do. The question is however, how do you get
> > > into this predicament to begin with? Are you creating and destroying a
> > > lot of global LBR events or something?
> > 
> > I think it's just because there are too many tasks in the system like
> > O(100K).  And any thread going to exit needs to wait for
> > attach_global_ctx_data() to finish the iteration over every task.
> 
> OMG, so many tasks ...
> 
> > > Would it make sense to delay detach_global_ctx_data() for a second or
> > > so? That is, what is your event creation pattern?
> > 
> > I don't think it has a special pattern, but I'm curious how we can
> > handle a race like below.
> > 
> >   attach_global_ctx_data
> >     check p->flags & PF_EXITING
> >                                               do_exit
> >     (preemption)                                set PF_EXITING
> >                                                 detach_task_ctx_data()
> >     check p->perf_ctx_data
> >     attach_task_ctx_data()   ---> memory leak
> 
> Oh right. Something like so perhaps?
> 
> ---
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 3c2a491200c6..e5e716420eb3 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -5421,9 +5421,19 @@ attach_task_ctx_data(struct task_struct *task, struct kmem_cache *ctx_cache,
>  		return -ENOMEM;
>  
>  	for (;;) {
> -		if (try_cmpxchg((struct perf_ctx_data **)&task->perf_ctx_data, &old, cd)) {
> +		if (try_cmpxchg(&task->perf_ctx_data, &old, cd)) {
>  			if (old)
>  				perf_free_ctx_data_rcu(old);
> +			/*
> +			 * try_cmpxchg() pairs with try_cmpxchg() from
> +			 * detach_task_ctx_data() such that
> +			 * if we race with perf_event_exit_task(), we must
> +			 * observe PF_EXITING.
> +			 */
> +			if (task->flags & PF_EXITING) {
> +				task->perf_ctx_data = NULL;
> +				perf_free_ctx_data_rcu(cd);

Ugh and now it can race and do a double free, another try_cmpxchg() is
needed here.

> +			}
>  			return 0;
>  		}
>  
> @@ -5469,6 +5479,8 @@ attach_global_ctx_data(struct kmem_cache *ctx_cache)
>  	/* Allocate everything */
>  	scoped_guard (rcu) {
>  		for_each_process_thread(g, p) {
> +			if (p->flags & PF_EXITING)
> +				continue;
>  			cd = rcu_dereference(p->perf_ctx_data);
>  			if (cd && !cd->global) {
>  				cd->global = 1;
> @@ -14568,8 +14580,11 @@ void perf_event_exit_task(struct task_struct *task)
>  
>  	/*
>  	 * Detach the perf_ctx_data for the system-wide event.
> +	 *
> +	 * Done without holding global_ctx_data_rwsem; typically
> +	 * attach_global_ctx_data() will skip over this task, but otherwise
> +	 * attach_task_ctx_data() will observe PF_EXITING.
>  	 */
> -	guard(percpu_read)(&global_ctx_data_rwsem);
>  	detach_task_ctx_data(task);
>  }
>  

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ