[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aWa3TncpY3Jfd_2c@google.com>
Date: Tue, 13 Jan 2026 13:21:18 -0800
From: Namhyung Kim <namhyung@...nel.org>
To: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...nel.org>
Cc: Mark Rutland <mark.rutland@....com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
LKML <linux-kernel@...r.kernel.org>,
Rosalie Fang <rosaliefang@...gle.com>
Subject: Re: [PATCH] perf/core: Fix slow perf_event_task_exit() with LBR
callstacks
On Mon, Jan 12, 2026 at 08:51:57AM -0800, Namhyung Kim wrote:
> I got a report that a task is stuck in perf_event_exit_task() waiting
> for global_ctx_data_rwsem. On large systems with lots threads, it'd
> have performance issues when it grabs the lock to iterate all threads
> in the system to allocate the context data.
>
> And it'd block task exit path which is problematic especially under
> memory pressure.
>
> perf_event_open
> perf_event_alloc
> attach_perf_ctx_data
> attach_global_ctx_data
> percpu_down_write (global_ctx_data_rwsem)
> for_each_process_thread
> alloc_task_ctx_data
> do_exit
> perf_event_exit_task
> percpu_down_read (global_ctx_data_rwsem)
>
> It should not hold the global_ctx_data_rwsem on the exit path. Let's
> skip allocation for exiting tasks and free the data carefully.
>
> Reported-by: Rosalie Fang <rosaliefang@...gle.com>
> Suggested-by: Peter Zijlstra <peterz@...radead.org>
> Signed-off-by: Namhyung Kim <namhyung@...nel.org>
> ---
> kernel/events/core.c | 20 ++++++++++++++++++--
> 1 file changed, 18 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 376fb07d869b8b50..e87bb43b7bb3dd4b 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -5421,9 +5421,20 @@ attach_task_ctx_data(struct task_struct *task, struct kmem_cache *ctx_cache,
> return -ENOMEM;
>
> for (;;) {
> - if (try_cmpxchg((struct perf_ctx_data **)&task->perf_ctx_data, &old, cd)) {
> + if (try_cmpxchg(&task->perf_ctx_data, &old, cd)) {
It seems we need to keep this casting to suppress sparse warnings.
Thanks,
Namhyung
> if (old)
> perf_free_ctx_data_rcu(old);
> + /*
> + * Above try_cmpxchg() pairs with try_cmpxchg() from
> + * detach_task_ctx_data() such that
> + * if we race with perf_event_exit_task(), we must
> + * observe PF_EXITING.
> + */
> + if (task->flags & PF_EXITING) {
> + /* detach_task_ctx_data() may free it already */
> + if (try_cmpxchg(&task->perf_ctx_data, &cd, NULL))
> + perf_free_ctx_data_rcu(cd);
> + }
> return 0;
> }
>
> @@ -5469,6 +5480,8 @@ attach_global_ctx_data(struct kmem_cache *ctx_cache)
> /* Allocate everything */
> scoped_guard (rcu) {
> for_each_process_thread(g, p) {
> + if (p->flags & PF_EXITING)
> + continue;
> cd = rcu_dereference(p->perf_ctx_data);
> if (cd && !cd->global) {
> cd->global = 1;
> @@ -14562,8 +14575,11 @@ void perf_event_exit_task(struct task_struct *task)
>
> /*
> * Detach the perf_ctx_data for the system-wide event.
> + *
> + * Done without holding global_ctx_data_rwsem; typically
> + * attach_global_ctx_data() will skip over this task, but otherwise
> + * attach_task_ctx_data() will observe PF_EXITING.
> */
> - guard(percpu_read)(&global_ctx_data_rwsem);
> detach_task_ctx_data(task);
> }
>
> --
> 2.52.0.457.g6b5491de43-goog
>
Powered by blists - more mailing lists