linux-kernel - Re: [sched_ext/for-6.11]: Issue with BPF Scheduler during CPU Hotplug

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3da12c96daecaa055c478816f5e86c7b44a04d53.camel@linux.ibm.com>
Date: Fri, 23 Aug 2024 14:50:01 +0530
From: Aboorva Devarajan <aboorvad@...ux.ibm.com>
To: Tejun Heo <tj@...nel.org>
Cc: void@...ifault.com, linux-kernel@...r.kernel.org
Subject: Re: [sched_ext/for-6.11]: Issue with BPF Scheduler during CPU
 Hotplug

On Tue, 2024-08-20 at 09:38 -1000, Tejun Heo wrote:
> On Tue, Aug 20, 2024 at 12:33:34PM +0530, Aboorva Devarajan wrote:
> > On Tue, 2024-08-13 at 09:54 -1000, Tejun Heo wrote:
> > > Hello,
> > > 
> > > On Sat, Aug 10, 2024 at 11:54:24PM +0530, Aboorva Devarajan wrote:
> > > ...
> I think this *should* fix the problem but it is rather ugly. The locking
> order is such that there's no good way out. Maybe the solution is making
> cpu_hotplug_disable() more useful. Anyways, can you test this one?
> 
> Thanks.
> 
> ---
>  kernel/sched/ext.c |   30 ++++++++++++++++++++++--------
>  1 file changed, 22 insertions(+), 8 deletions(-)
> 
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -3990,6 +3990,24 @@ static const char *scx_exit_reason(enum
>  	}
>  }
>  
> +static void block_fork_hotplug(void)
> +{
> +	while (true) {
> +		percpu_down_write(&scx_fork_rwsem);
> +		if (cpus_read_trylock())
> +			return;
> +		percpu_up_write(&scx_fork_rwsem);
> +		cpus_read_lock();
> +		cpus_read_unlock();
> +	}
> +}
> +
> +static void unblock_fork_hotplug(void)
> +{
> +	cpus_read_unlock();
> +	percpu_up_write(&scx_fork_rwsem);
> +}
> +
>  static void scx_ops_disable_workfn(struct kthread_work *work)
>  {
>  	struct scx_exit_info *ei = scx_exit_info;
> @@ -4045,8 +4063,7 @@ static void scx_ops_disable_workfn(struc
>  	 * Avoid racing against fork. See scx_ops_enable() for explanation on
>  	 * the locking order.
>  	 */
> -	percpu_down_write(&scx_fork_rwsem);
> -	cpus_read_lock();
> +	block_fork_hotplug();
>  
>  	spin_lock_irq(&scx_tasks_lock);
>  	scx_task_iter_init(&sti);
> @@ -4090,8 +4107,7 @@ static void scx_ops_disable_workfn(struc
>  	static_branch_disable_cpuslocked(&scx_builtin_idle_enabled);
>  	synchronize_rcu();
>  
> -	cpus_read_unlock();
> -	percpu_up_write(&scx_fork_rwsem);
> +	unblock_fork_hotplug();
>  
>  	if (ei->kind >= SCX_EXIT_ERROR) {
>  		pr_err("sched_ext: BPF scheduler \"%s\" disabled (%s)\n",
> @@ -4657,8 +4673,7 @@ static int scx_ops_enable(struct sched_e
>  	 *
>  	 *   scx_fork_rwsem --> pernet_ops_rwsem --> cpu_hotplug_lock
>  	 */
> -	percpu_down_write(&scx_fork_rwsem);
> -	cpus_read_lock();
> +	block_fork_hotplug();
>  
>  	check_hotplug_seq(ops);
>  
> @@ -4765,8 +4780,7 @@ static int scx_ops_enable(struct sched_e
>  
>  	spin_unlock_irq(&scx_tasks_lock);
>  	preempt_enable();
> -	cpus_read_unlock();
> -	percpu_up_write(&scx_fork_rwsem);
> +	unblock_fork_hotplug();
>  
>  	/* see above ENABLING transition for the explanation on exiting with 0 */
>  	if (!scx_ops_tryset_enable_state(SCX_OPS_ENABLED, SCX_OPS_ENABLING)) {

Hi Tejun,

I applied this patch to the almost latest sched-ext (for-6.12) branch upto
commit 89909296a51e792 ("sched_ext: Don't use double locking to migrate
tasks across CPUs") and let the test run for over 20 hours, and it completed
without any hangs on both x86 and PowerPC.

So, indeed, making sure that both scx_fork_rwsem and cpu_hotplug_lock (read)
are only held together simulataneously when they can both be acquired seems
to be resolving the deadlock.

Thanks,
Aboorva