linux-kernel - Re: [PATCH] perf/core: Handling the race between exit

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aEBeRfScZKD-7h5u@J2N7QTR9R3>
Date: Wed, 4 Jun 2025 15:55:01 +0100
From: Mark Rutland <mark.rutland@....com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Baisheng Gao <baisheng.gao@...soc.com>, Ingo Molnar <mingo@...hat.com>,
	Arnaldo Carvalho de Melo <acme@...nel.org>,
	Namhyung Kim <namhyung@...nel.org>,
	Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
	Jiri Olsa <jolsa@...nel.org>, Ian Rogers <irogers@...gle.com>,
	Adrian Hunter <adrian.hunter@...el.com>,
	"reviewer:PERFORMANCE EVENTS SUBSYSTEM" <kan.liang@...ux.intel.com>,
	"open list:PERFORMANCE EVENTS SUBSYSTEM" <linux-perf-users@...r.kernel.org>,
	"open list:PERFORMANCE EVENTS SUBSYSTEM" <linux-kernel@...r.kernel.org>,
	cixi.geng@...ux.dev, hao_hao.wang@...soc.com
Subject: Re: [PATCH] perf/core: Handling the race between exit_mmap and perf
 sample

On Wed, Jun 04, 2025 at 04:24:37PM +0200, Peter Zijlstra wrote:
> On Wed, Jun 04, 2025 at 03:05:43PM +0100, Mark Rutland wrote:
> 
> > Loooking at 5.15.149 and current HEAD (5abc7438f1e9), do_exit() calls
> > exit_mm() before perf_event_exit_task(), so it looks
> > like perf could sample from another task's mm.
> > 
> > Yuck.
> > 
> > Peter, does the above sound plausible to you?
> 
> Yuck indeed. And yeah, we should probably re-arrange things there.
> 
> Something like so?

That should plumb the hole for task-bound events, yep.

I think we might need something in the perf core for cpu-bound events, assuming
those can also potentially make samples.

>From a quick scan of perf_event_sample_format:

	PERF_SAMPLE_IP			// safe
	PERF_SAMPLE_TID			// safe
	PERF_SAMPLE_TIME		// safe
	PERF_SAMPLE_ADDR		// ???
	PERF_SAMPLE_READ		// ???
	PERF_SAMPLE_CALLCHAIN		// may access mm
	PERF_SAMPLE_ID			// safe
	PERF_SAMPLE_CPU			// safe
	PERF_SAMPLE_PERIOD		// safe
	PERF_SAMPLE_STREAM_ID		// ???
	PERF_SAMPLE_RAW			// ???
	PERF_SAMPLE_BRANCH_STACK	// safe
	PERF_SAMPLE_REGS_USER		// safe
	PERF_SAMPLE_STACK_USER		// may access mm
	PERF_SAMPLE_WEIGHT		// ???
	PERF_SAMPLE_DATA_SRC		// ???
	PERF_SAMPLE_IDENTIFIER		// safe
	PERF_SAMPLE_TRANSACTION		// ???
	PERF_SAMPLE_REGS_INTR		// safe
	PERF_SAMPLE_PHYS_ADDR		// safe; handles mm==NULL && addr < TASK_SIZE
	PERF_SAMPLE_AUX			// ???
	PERF_SAMPLE_CGROUP		// safe
	PERF_SAMPLE_DATA_PAGE_SIZE	// partial; doesn't check addr < TASK_SIZE
	PERF_SAMPLE_CODE_PAGE_SIZE	// partial; doesn't check addr < TASK_SIZE
	PERF_SAMPLE_WEIGHT_STRUCT	// ???

... I think all the dodgy cases use mm somehow, so maybe the perf core
should check for current->mm?

> 
> ---
> diff --git a/kernel/exit.c b/kernel/exit.c
> index 38645039dd8f..3407c16fc5a3 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -944,6 +944,15 @@ void __noreturn do_exit(long code)
>  	taskstats_exit(tsk, group_dead);
>  	trace_sched_process_exit(tsk, group_dead);
>  
> +	/*
> +	 * Since samping can touch ->mm, make sure to stop everything before we

Typo: s/samping/sampling/

> +	 * tear it down.
> +	 *
> +	 * Also flushes inherited counters to the parent - before the parent
> +	 * gets woken up by child-exit notifications.
> +	 */
> +	perf_event_exit_task(tsk);
> +
>  	exit_mm();
>  
>  	if (group_dead)
> @@ -959,14 +968,6 @@ void __noreturn do_exit(long code)
>  	exit_task_work(tsk);
>  	exit_thread(tsk);
>  
> -	/*
> -	 * Flush inherited counters to the parent - before the parent
> -	 * gets woken up by child-exit notifications.
> -	 *
> -	 * because of cgroup mode, must be called before cgroup_exit()
> -	 */
> -	perf_event_exit_task(tsk);
> -
>  	sched_autogroup_exit_task(tsk);
>  	cgroup_exit(tsk);
>  

Otherwise, that looks good to me!

Mark.