linux-kernel - Re: [Question] report a race condition between CPU hotplug state machine and hrtimer 'sched_cfs_period

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3256f9c1-dccd-5995-5b14-afaae281be90@huawei.com>
Date:   Thu, 24 Aug 2023 15:25:22 +0800
From:   Yu Liao <liaoyu15@...wei.com>
To:     Thomas Gleixner <tglx@...utronix.de>,
        Vincent Guittot <vincent.guittot@...aro.org>
CC:     Xiongfeng Wang <wangxiongfeng2@...wei.com>, <vschneid@...hat.com>,
        Phil Auld <pauld@...hat.com>, <vdonnefort@...gle.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Wei Li <liwei391@...wei.com>, <zhangqiao22@...wei.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Ingo Molnar <mingo@...nel.org>
Subject: Re: [Question] report a race condition between CPU hotplug state
 machine and hrtimer 'sched_cfs_period_timer' for cfs bandwidth throttling

On 2023/8/23 18:14, Thomas Gleixner wrote:
> Subject: cpu/hotplug: Prevent self deadlock on CPU hot-unplug
> From: Thomas Gleixner <tglx@...utronix.de>
> Date: Wed, 23 Aug 2023 10:47:02 +0200
> 
> Xiongfeng reported and debugged a self deadlock of the task which initiates
> and controls a CPU hot-unplug operation vs. the CFS bandwidth timer.
> 
>     CPU1      			                 	 CPU2
> 
> T1 sets cfs_quota
>    starts hrtimer cfs_bandwidth 'period_timer'
> T1 is migrated to CPU2				
> 						T1 initiates offlining of CPU1
> Hotplug operation starts
>   ...
> 'period_timer' expires and is re-enqueued on CPU1
>   ...
> take_cpu_down()
>   CPU1 shuts down and does not handle timers
>   anymore. They have to be migrated in the
>   post dead hotplug steps by the control task.
> 
> 						T1 runs the post dead offline operation
> 					      	T1 is scheduled out
> 						T1 waits for 'period_timer' to expire
> 
> T1 waits there forever if it is scheduled out before it can execute the hrtimer
> offline callback hrtimers_dead_cpu().
> 
> Cure this by delegating the hotplug control operation to a worker thread on
> an online CPU. This takes the initiating user space task, which might be
> affected by the bandwidth timer, completely out of the picture.
> 
> Reported-by: Xiongfeng Wang <wangxiongfeng2@...wei.com>
> Signed-off-by: Thomas Gleixner <tglx@...utronix.de>
> Link: https://lore.kernel.org/lkml/8e785777-03aa-99e1-d20e-e956f5685be6@huawei.com
> ---
>  kernel/cpu.c |   24 +++++++++++++++++++++++-
>  1 file changed, 23 insertions(+), 1 deletion(-)
> 
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -1467,8 +1467,22 @@ static int __ref _cpu_down(unsigned int
>  	return ret;
>  }
>  
> +struct cpu_down_work {
> +	unsigned int		cpu;
> +	enum cpuhp_state	target;
> +};
> +
> +static long __cpu_down_maps_locked(void *arg)
> +{
> +	struct cpu_down_work *work = arg;
> +
> +	return _cpu_down(work->cpu, 0, work->target);
> +}
> +
>  static int cpu_down_maps_locked(unsigned int cpu, enum cpuhp_state target)
>  {
> +	struct cpu_down_work work = { .cpu = cpu, .target = target, };
> +
>  	/*
>  	 * If the platform does not support hotplug, report it explicitly to
>  	 * differentiate it from a transient offlining failure.
> @@ -1477,7 +1491,15 @@ static int cpu_down_maps_locked(unsigned
>  		return -EOPNOTSUPP;
>  	if (cpu_hotplug_disabled)
>  		return -EBUSY;
> -	return _cpu_down(cpu, 0, target);
> +
> +	/*
> +	 * Ensure that the control task does not run on the to be offlined
> +	 * CPU to prevent a deadlock against cfs_b->period_timer.
> +	 */
> +	cpu = cpumask_any_but(cpu_online_mask, cpu);
> +	if (cpu >= nr_cpu_ids)
> +		return -EBUSY;
> +	return work_on_cpu(cpu, __cpu_down_maps_locked, &work);
>  }
>  
>  static int cpu_down(unsigned int cpu, enum cpuhp_state target)

Thanks for the patch. Tested in v6.5-rc5 with test script provided by
Xiongfeng, this patch works.

Tested-by: Yu Liao <liaoyu15@...wei.com>

Best regards,
Yu