[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aEBeRfScZKD-7h5u@J2N7QTR9R3>
Date: Wed, 4 Jun 2025 15:55:01 +0100
From: Mark Rutland <mark.rutland@....com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Baisheng Gao <baisheng.gao@...soc.com>, Ingo Molnar <mingo@...hat.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Namhyung Kim <namhyung@...nel.org>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Jiri Olsa <jolsa@...nel.org>, Ian Rogers <irogers@...gle.com>,
Adrian Hunter <adrian.hunter@...el.com>,
"reviewer:PERFORMANCE EVENTS SUBSYSTEM" <kan.liang@...ux.intel.com>,
"open list:PERFORMANCE EVENTS SUBSYSTEM" <linux-perf-users@...r.kernel.org>,
"open list:PERFORMANCE EVENTS SUBSYSTEM" <linux-kernel@...r.kernel.org>,
cixi.geng@...ux.dev, hao_hao.wang@...soc.com
Subject: Re: [PATCH] perf/core: Handling the race between exit_mmap and perf
sample
On Wed, Jun 04, 2025 at 04:24:37PM +0200, Peter Zijlstra wrote:
> On Wed, Jun 04, 2025 at 03:05:43PM +0100, Mark Rutland wrote:
>
> > Loooking at 5.15.149 and current HEAD (5abc7438f1e9), do_exit() calls
> > exit_mm() before perf_event_exit_task(), so it looks
> > like perf could sample from another task's mm.
> >
> > Yuck.
> >
> > Peter, does the above sound plausible to you?
>
> Yuck indeed. And yeah, we should probably re-arrange things there.
>
> Something like so?
That should plumb the hole for task-bound events, yep.
I think we might need something in the perf core for cpu-bound events, assuming
those can also potentially make samples.
>From a quick scan of perf_event_sample_format:
PERF_SAMPLE_IP // safe
PERF_SAMPLE_TID // safe
PERF_SAMPLE_TIME // safe
PERF_SAMPLE_ADDR // ???
PERF_SAMPLE_READ // ???
PERF_SAMPLE_CALLCHAIN // may access mm
PERF_SAMPLE_ID // safe
PERF_SAMPLE_CPU // safe
PERF_SAMPLE_PERIOD // safe
PERF_SAMPLE_STREAM_ID // ???
PERF_SAMPLE_RAW // ???
PERF_SAMPLE_BRANCH_STACK // safe
PERF_SAMPLE_REGS_USER // safe
PERF_SAMPLE_STACK_USER // may access mm
PERF_SAMPLE_WEIGHT // ???
PERF_SAMPLE_DATA_SRC // ???
PERF_SAMPLE_IDENTIFIER // safe
PERF_SAMPLE_TRANSACTION // ???
PERF_SAMPLE_REGS_INTR // safe
PERF_SAMPLE_PHYS_ADDR // safe; handles mm==NULL && addr < TASK_SIZE
PERF_SAMPLE_AUX // ???
PERF_SAMPLE_CGROUP // safe
PERF_SAMPLE_DATA_PAGE_SIZE // partial; doesn't check addr < TASK_SIZE
PERF_SAMPLE_CODE_PAGE_SIZE // partial; doesn't check addr < TASK_SIZE
PERF_SAMPLE_WEIGHT_STRUCT // ???
... I think all the dodgy cases use mm somehow, so maybe the perf core
should check for current->mm?
>
> ---
> diff --git a/kernel/exit.c b/kernel/exit.c
> index 38645039dd8f..3407c16fc5a3 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -944,6 +944,15 @@ void __noreturn do_exit(long code)
> taskstats_exit(tsk, group_dead);
> trace_sched_process_exit(tsk, group_dead);
>
> + /*
> + * Since samping can touch ->mm, make sure to stop everything before we
Typo: s/samping/sampling/
> + * tear it down.
> + *
> + * Also flushes inherited counters to the parent - before the parent
> + * gets woken up by child-exit notifications.
> + */
> + perf_event_exit_task(tsk);
> +
> exit_mm();
>
> if (group_dead)
> @@ -959,14 +968,6 @@ void __noreturn do_exit(long code)
> exit_task_work(tsk);
> exit_thread(tsk);
>
> - /*
> - * Flush inherited counters to the parent - before the parent
> - * gets woken up by child-exit notifications.
> - *
> - * because of cgroup mode, must be called before cgroup_exit()
> - */
> - perf_event_exit_task(tsk);
> -
> sched_autogroup_exit_task(tsk);
> cgroup_exit(tsk);
>
Otherwise, that looks good to me!
Mark.
Powered by blists - more mailing lists