linux-kernel - Re: [Question] report a race condition between CPU hotplug state machine and hrtimer 'sched_cfs_period

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKfTPtDaN8ytQ2FHXkRXHk4AWz0xSi+pz_SseaDp=6JssRzWyg@mail.gmail.com>
Date:   Tue, 29 Aug 2023 09:18:23 +0200
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     Xiongfeng Wang <wangxiongfeng2@...wei.com>, vschneid@...hat.com,
        Phil Auld <pauld@...hat.com>, vdonnefort@...gle.com,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Wei Li <liwei391@...wei.com>,
        "liaoyu (E)" <liaoyu15@...wei.com>, zhangqiao22@...wei.com,
        Peter Zijlstra <peterz@...radead.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Ingo Molnar <mingo@...nel.org>
Subject: Re: [Question] report a race condition between CPU hotplug state
 machine and hrtimer 'sched_cfs_period_timer' for cfs bandwidth throttling

On Wed, 23 Aug 2023 at 12:14, Thomas Gleixner <tglx@...utronix.de> wrote:
>
> On Thu, Jun 29 2023 at 10:30, Vincent Guittot wrote:
> > On Thu, 29 Jun 2023 at 00:01, Thomas Gleixner <tglx@...utronix.de> wrote:
> >>  int cpu_device_down(struct device *dev)
> >>  {
> >> -       return cpu_down(dev->id, CPUHP_OFFLINE);
> >> +       unsigned int cpu = cpumask_any_but(cpu_online_mask, dev->id);
> >> +
> >> +       if (cpu >= nr_cpu_ids)
> >> +               return -EBUSY;
> >> +
> >> +       return work_on_cpu(cpu, __cpu_device_down, dev);
> >
> > The comment for work_on_cpu :
> >
> >  * It is up to the caller to ensure that the cpu doesn't go offline.
> >  * The caller must not hold any locks which would prevent @fn from completing.
> >
> > make me wonder if this should be done only once the hotplug lock is
> > taken so the selected cpu will not go offline
>
> That makes sense. Updated and again untested patch below.
>
> Thanks,
>
>         tglx
> ---
> Subject: cpu/hotplug: Prevent self deadlock on CPU hot-unplug
> From: Thomas Gleixner <tglx@...utronix.de>
> Date: Wed, 23 Aug 2023 10:47:02 +0200
>
> Xiongfeng reported and debugged a self deadlock of the task which initiates
> and controls a CPU hot-unplug operation vs. the CFS bandwidth timer.
>
>     CPU1                                                 CPU2
>
> T1 sets cfs_quota
>    starts hrtimer cfs_bandwidth 'period_timer'
> T1 is migrated to CPU2
>                                                 T1 initiates offlining of CPU1
> Hotplug operation starts
>   ...
> 'period_timer' expires and is re-enqueued on CPU1
>   ...
> take_cpu_down()
>   CPU1 shuts down and does not handle timers
>   anymore. They have to be migrated in the
>   post dead hotplug steps by the control task.
>
>                                                 T1 runs the post dead offline operation
>                                                 T1 is scheduled out
>                                                 T1 waits for 'period_timer' to expire
>
> T1 waits there forever if it is scheduled out before it can execute the hrtimer
> offline callback hrtimers_dead_cpu().
>
> Cure this by delegating the hotplug control operation to a worker thread on
> an online CPU. This takes the initiating user space task, which might be
> affected by the bandwidth timer, completely out of the picture.
>
> Reported-by: Xiongfeng Wang <wangxiongfeng2@...wei.com>
> Signed-off-by: Thomas Gleixner <tglx@...utronix.de>
> Link: https://lore.kernel.org/lkml/8e785777-03aa-99e1-d20e-e956f5685be6@huawei.com

Acked-by: Vincent Guittot <vincent.guittot@...aro.org>


> ---
>  kernel/cpu.c |   24 +++++++++++++++++++++++-
>  1 file changed, 23 insertions(+), 1 deletion(-)
>
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -1467,8 +1467,22 @@ static int __ref _cpu_down(unsigned int
>         return ret;
>  }
>
> +struct cpu_down_work {
> +       unsigned int            cpu;
> +       enum cpuhp_state        target;
> +};
> +
> +static long __cpu_down_maps_locked(void *arg)
> +{
> +       struct cpu_down_work *work = arg;
> +
> +       return _cpu_down(work->cpu, 0, work->target);
> +}
> +
>  static int cpu_down_maps_locked(unsigned int cpu, enum cpuhp_state target)
>  {
> +       struct cpu_down_work work = { .cpu = cpu, .target = target, };
> +
>         /*
>          * If the platform does not support hotplug, report it explicitly to
>          * differentiate it from a transient offlining failure.
> @@ -1477,7 +1491,15 @@ static int cpu_down_maps_locked(unsigned
>                 return -EOPNOTSUPP;
>         if (cpu_hotplug_disabled)
>                 return -EBUSY;
> -       return _cpu_down(cpu, 0, target);
> +
> +       /*
> +        * Ensure that the control task does not run on the to be offlined
> +        * CPU to prevent a deadlock against cfs_b->period_timer.
> +        */
> +       cpu = cpumask_any_but(cpu_online_mask, cpu);
> +       if (cpu >= nr_cpu_ids)
> +               return -EBUSY;
> +       return work_on_cpu(cpu, __cpu_down_maps_locked, &work);
>  }
>
>  static int cpu_down(unsigned int cpu, enum cpuhp_state target)