[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250702140816.cea1c371bdcc92ec55a59434@linux-foundation.org>
Date: Wed, 2 Jul 2025 14:08:16 -0700
From: Andrew Morton <akpm@...ux-foundation.org>
To: Chen Yu <yu.c.chen@...el.com>
Cc: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>, Vincent Guittot
<vincent.guittot@...aro.org>, Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, Mel
Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>, Tim
Chen <tim.c.chen@...el.com>, linux-kernel@...r.kernel.org, Jirka Hladky
<jhladky@...hat.com>, Srikanth Aithal <Srikanth.Aithal@....com>, Suneeth D
<Suneeth.D@....com>, Libo Chen <libo.chen@...cle.com>
Subject: Re: [PATCH] sched/numa: Fix NULL pointer access to mm_struct durng
task swap
On Thu, 3 Jul 2025 00:32:47 +0800 Chen Yu <yu.c.chen@...el.com> wrote:
> It was reported that after Commit ad6b26b6a0a7
> ("sched/numa: add statistics of numa balance task"),
> a NULL pointer exception[1] occurs when accessing
> p->mm. The following race condition was found to
> trigger this bug: After a swap task candidate is
> chosen during NUMA balancing, its mm_struct is
> released due to task exit. Later, when the task
> swapping is performed, p->mm is NULL, which causes
> the problem:
>
> CPU0 CPU1
> :
> ...
> task_numa_migrate
> task_numa_find_cpu
> task_numa_compare
> # a normal task p is chosen
> env->best_task = p
>
> # p exit:
> exit_signals(p);
> p->flags |= PF_EXITING
> exit_mm
> p->mm = NULL;
>
> migrate_swap_stop
> __migrate_swap_task((arg->src_task, arg->dst_cpu)
> count_memcg_event_mm(p->mm, NUMA_TASK_SWAP)# p->mm is NULL
>
> Fix this issue by checking if the task has the PF_EXITING
> flag set in migrate_swap_stop(). If it does, skip updating
> the memcg events. Additionally, log a warning if p->mm is
> NULL to facilitate future debugging.
>
> ...
>
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3364,7 +3364,14 @@ static void __migrate_swap_task(struct task_struct *p, int cpu)
> {
> __schedstat_inc(p->stats.numa_task_swapped);
> count_vm_numa_event(NUMA_TASK_SWAP);
> - count_memcg_event_mm(p->mm, NUMA_TASK_SWAP);
> + /* exiting task has NULL mm */
> + if (!(p->flags & PF_EXITING)) {
> + WARN_ONCE(!p->mm, "swap task %d %s %x has no mm\n",
> + p->pid, p->comm, p->flags);
> +
> + if (p->mm)
> + count_memcg_event_mm(p->mm, NUMA_TASK_SWAP);
> + }
I don't think we should warn on a condition which is known to occur and
which we successfully handle. What action can anyone take upon that
warning?
Which means the change might as well become
+ /* comment goes here */
+ if (p->mm)
+ count_memcg_event_mm(p->mm, NUMA_TASK_SWAP);
But is that a real fix? Can the other thread call exit(), set
PF_EXITING and null its p->mm right between the above two lines? After
the p->mm test and before the count_memcg_event_mm() call?
IOW, there needs to be some locking in place to stabilize p->mm
throughout the p->mm test and the count_memcg_event_mm() call?
Powered by blists - more mailing lists