linux-kernel - Re: [PATCH] sched/numa: Fix NULL pointer access to mm

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20250702140816.cea1c371bdcc92ec55a59434@linux-foundation.org>
Date: Wed, 2 Jul 2025 14:08:16 -0700
From: Andrew Morton <akpm@...ux-foundation.org>
To: Chen Yu <yu.c.chen@...el.com>
Cc: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
 Juri Lelli <juri.lelli@...hat.com>, Vincent Guittot
 <vincent.guittot@...aro.org>, Dietmar Eggemann <dietmar.eggemann@....com>,
 Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, Mel
 Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>, Tim
 Chen <tim.c.chen@...el.com>, linux-kernel@...r.kernel.org, Jirka Hladky
 <jhladky@...hat.com>, Srikanth Aithal <Srikanth.Aithal@....com>, Suneeth D
 <Suneeth.D@....com>, Libo Chen <libo.chen@...cle.com>
Subject: Re: [PATCH] sched/numa: Fix NULL pointer access to mm_struct durng
 task swap

On Thu,  3 Jul 2025 00:32:47 +0800 Chen Yu <yu.c.chen@...el.com> wrote:

> It was reported that after Commit ad6b26b6a0a7
> ("sched/numa: add statistics of numa balance task"),
> a NULL pointer exception[1] occurs when accessing
> p->mm. The following race condition was found to
> trigger this bug: After a swap task candidate is
> chosen during NUMA balancing, its mm_struct is
> released due to task exit. Later, when the task
> swapping is performed, p->mm is NULL, which causes
> the problem:
> 
> CPU0                                   CPU1
> :
> ...
> task_numa_migrate
>    task_numa_find_cpu
>     task_numa_compare
>       # a normal task p is chosen
>       env->best_task = p
> 
>                                         # p exit:
>                                         exit_signals(p);
>                                            p->flags |= PF_EXITING
>                                         exit_mm
>                                            p->mm = NULL;
> 
>     migrate_swap_stop
>       __migrate_swap_task((arg->src_task, arg->dst_cpu)
>        count_memcg_event_mm(p->mm, NUMA_TASK_SWAP)# p->mm is NULL
> 
> Fix this issue by checking if the task has the PF_EXITING
> flag set in migrate_swap_stop(). If it does, skip updating
> the memcg events. Additionally, log a warning if p->mm is
> NULL to facilitate future debugging.
>
> ...
>
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3364,7 +3364,14 @@ static void __migrate_swap_task(struct task_struct *p, int cpu)
>  {
>  	__schedstat_inc(p->stats.numa_task_swapped);
>  	count_vm_numa_event(NUMA_TASK_SWAP);
> -	count_memcg_event_mm(p->mm, NUMA_TASK_SWAP);
> +	/* exiting task has NULL mm */
> +	if (!(p->flags & PF_EXITING)) {
> +		WARN_ONCE(!p->mm, "swap task %d %s %x has no mm\n",
> +			  p->pid, p->comm, p->flags);
> +
> +		if (p->mm)
> +			count_memcg_event_mm(p->mm, NUMA_TASK_SWAP);
> +	}

I don't think we should warn on a condition which is known to occur and
which we successfully handle.  What action can anyone take upon that
warning?

Which means the change might as well become

+	/* comment goes here */
+	if (p->mm)
+		count_memcg_event_mm(p->mm, NUMA_TASK_SWAP);

But is that a real fix?  Can the other thread call exit(), set
PF_EXITING and null its p->mm right between the above two lines?  After
the p->mm test and before the count_memcg_event_mm() call?

IOW, there needs to be some locking in place to stabilize p->mm
throughout the p->mm test and the count_memcg_event_mm() call?