[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <dd61a698-a298-4038-a4d9-a40ffe0b05d6@redhat.com>
Date: Sun, 23 Nov 2025 23:31:47 -0500
From: Waiman Long <llong@...hat.com>
To: Pingfan Liu <piliu@...hat.com>, Tejun Heo <tj@...nel.org>,
cgroups@...r.kernel.org, linux-kernel@...r.kernel.org
Cc: Juri Lelli <juri.lelli@...hat.com>,
Chen Ridong <chenridong@...weicloud.com>,
Peter Zijlstra <peterz@...radead.org>,
Pierre Gondois <pierre.gondois@....com>, Ingo Molnar <mingo@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
Mel Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>
Subject: Re: [PATCH] sched/deadline: Fix potential race in
dl_add_task_root_domain()
On 11/23/25 10:34 PM, Pingfan Liu wrote:
> The access rule for local_cpu_mask_dl requires it to be called on the
> local CPU with preemption disabled. However, dl_add_task_root_domain()
> currently violates this rule.
>
> Without preemption disabled, the following race can occur:
>
> 1. ThreadA calls dl_add_task_root_domain() on CPU 0
> 2. Gets pointer to CPU 0's local_cpu_mask_dl
> 3. ThreadA is preempted and migrated to CPU 1
> 4. ThreadA continues using CPU 0's local_cpu_mask_dl
> 5. Meanwhile, the scheduler on CPU 0 calls find_later_rq() which also
> uses local_cpu_mask_dl (with preemption properly disabled)
> 6. Both contexts now corrupt the same per-CPU buffer concurrently
>
> Fix this by moving the local_cpu_mask_dl access to the preemption
> disabled section.
>
> Closes: https://lore.kernel.org/lkml/aSBjm3mN_uIy64nz@jlelli-thinkpadt14gen4.remote.csb
> Fixes: 318e18ed22e8 ("sched/deadline: Walk up cpuset hierarchy to decide root domain when hot-unplug")
> Reported-by: Juri Lelli <juri.lelli@...hat.com>
> Signed-off-by: Pingfan Liu <piliu@...hat.com>
> To: Tejun Heo <tj@...nel.org>
> Cc: Waiman Long <longman@...hat.com>
> Cc: Chen Ridong <chenridong@...weicloud.com>
> Cc: Peter Zijlstra <peterz@...radead.org>
> Cc: Juri Lelli <juri.lelli@...hat.com>
> Cc: Pierre Gondois <pierre.gondois@....com>
> Cc: Ingo Molnar <mingo@...hat.com>
> Cc: Vincent Guittot <vincent.guittot@...aro.org>
> Cc: Dietmar Eggemann <dietmar.eggemann@....com>
> Cc: Steven Rostedt <rostedt@...dmis.org>
> Cc: Ben Segall <bsegall@...gle.com>
> Cc: Mel Gorman <mgorman@...e.de>
> Cc: Valentin Schneider <vschneid@...hat.com>
> To: cgroups@...r.kernel.org
> To: linux-kernel@...r.kernel.org
> ---
> kernel/sched/deadline.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 194a341e85864..e9153e86de0a7 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -2944,7 +2944,7 @@ void dl_add_task_root_domain(struct task_struct *p)
> struct rq *rq;
> struct dl_bw *dl_b;
> unsigned int cpu;
> - struct cpumask *msk = this_cpu_cpumask_var_ptr(local_cpu_mask_dl);
> + struct cpumask *msk;
>
> raw_spin_lock_irqsave(&p->pi_lock, rf.flags);
> if (!dl_task(p) || dl_entity_is_special(&p->dl)) {
> @@ -2952,6 +2952,7 @@ void dl_add_task_root_domain(struct task_struct *p)
> return;
> }
>
> + msk = this_cpu_cpumask_var_ptr(local_cpu_mask_dl);
> /*
> * Get an active rq, whose rq->rd traces the correct root
> * domain.
It will be clearerer by moving the statement down to before the
dl_get_task_effective_cpus() call that uses msk. Please also update the
comment as suggested by Juri.
Thanks,
Longman
Powered by blists - more mailing lists