linux-kernel - Re: [sched_ext/for-6.11]: Issue with BPF Scheduler during CPU Hotplug

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <dde4b09001da2641f9679b9409727e2147c5e9a6.camel@linux.ibm.com>
Date: Thu, 11 Jul 2024 17:30:58 +0530
From: Aboorva Devarajan <aboorvad@...ux.ibm.com>
To: Tejun Heo <tj@...nel.org>
Cc: void@...ifault.com, linux-kernel@...r.kernel.org
Subject: Re: [sched_ext/for-6.11]: Issue with BPF Scheduler during CPU
 Hotplug

Hi Tejun,

On Tue, 2024-07-09 at 14:02 -1000, Tejun Heo wrote:

> Hello,
> 
> On Tue, Jul 09, 2024 at 01:21:38PM +0530, Aboorva Devarajan wrote:
> ...
> > Steps to reproduce the issue:
> > 
> > - Run the scx_simple scheduler.
> > - Randomly offline and online CPUs from 1 to 127.
> > 
> > During the CPU hotplug, the custom scheduler successfully
> > unregisters and registers itself without issues. However, it
> > occasionally encounters a crash and the system becomes unresponsive.
> 
> How reproducible is the problem?
> 
> 

This is highly transient and occurs approximately once in every 15-20 mins
when continuosly running CPU hotplug online and offline operations while
`scx_simple` scheduler is up and running.

> ...
> > [64650.801834] task:scx_simple      state:D stack:0     pid:11214 tgid:11214 ppid:11013  flags:0x00040080
> > [64650.801949] Sched_ext: simple (prepping)
> > [64650.801978] Call Trace:
> > [64650.802062] [c000000e9bdf35b0] [c0000000016f0c06] scx_dump_state.trunc_marker+0x2078/0x5d2a (unreliable)
> > [64650.802231] [c000000e9bdf3760] [c00000000001dd68] __switch_to+0x238/0x310
> > [64650.802437] [c000000e9bdf37c0] [c0000000013e6170] __schedule+0xa10/0xf80
> > [64650.802599] [c000000e9bdf38c0] [c0000000013e6764] schedule+0x84/0x128
> > [64650.802700] [c000000e9bdf38f0] [c00000000023f4dc] percpu_rwsem_wait+0x13c/0x1a0
> > [64650.802807] [c000000e9bdf3970] [c0000000013ee7ec] __percpu_down_read+0xac/0x21c
> > [64650.802926] [c000000e9bdf39f0] [c0000000001739b8] cpus_read_lock+0x158/0x180
> > [64650.803065] [c000000e9bdf3a30] [c00000000021fd88] bpf_scx_reg+0x5d8/0xb80
> > [64650.803161] [c000000e9bdf3c10] [c00000000049e308] bpf_struct_ops_link_create+0x158/0x200
> > [64650.803259] [c000000e9bdf3c80] [c000000000407628] link_create+0x78/0x3b0
> > [64650.803319] [c000000e9bdf3cd0] [c0000000004026fc] __sys_bpf+0x39c/0x560
> > [64650.803389] [c000000e9bdf3dc0] [c000000000400b00] sys_bpf+0x50/0x80
> > [64650.803449] [c000000e9bdf3e00] [c00000000003218c] system_call_exception+0x10c/0x2b0
> > [64650.803559] [c000000e9bdf3e50] [c00000000000c7d4] system_call_common+0xf4/0x258
> ...
> > [64650.808330] Showing all locks held in the system:
> > [64650.808391] 1 lock held by kthreadd/2:
> > [64650.808431]  #0: c0000000029259c8 (scx_fork_rwsem){++++}-{0:0}, at: sched_fork+0x80/0x240
> ...
> > [64650.810515] 4 locks held by scx_simple/11214:
> > [64650.810606]  #0: c000000002a609b0 (update_mutex){+.+.}-{4:4}, at: bpf_struct_ops_link_create+0x134/0x200
> > [64650.810781]  #1: c000000002927170 (scx_ops_enable_mutex){+.+.}-{4:4}, at: bpf_scx_reg+0x80/0xb80
> > [64650.810886]  #2: c0000000029259c8 (scx_fork_rwsem){++++}-{0:0}, at: bpf_scx_reg+0x5d0/0xb80
> > [64650.811023]  #3: c000000002913278 (cpu_hotplug_lock){++++}-{0:0}, at: bpf_scx_reg+0x5d8/0xb80
> > [64650.811174] 7 locks held by hotplug.sh/11440:
> > [64650.811240]  #0: c000000e9bfe6450 (sb_writers#6){.+.+}-{0:0}, at: vfs_write+0xcc/0x370
> > [64650.811362]  #1: c000000e11b31e90 (&of->mutex){+.+.}-{4:4}, at: kernfs_fop_write_iter+0x10c/0x260
> > [64650.811514]  #2: c000000df2892620 (kn->active#73){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x118/0x260
> > [64650.811675]  #3: c000000002aebc08 (device_hotplug_lock){+.+.}-{4:4}, at: online_store+0x6c/0x1a0
> > [64650.811814]  #4: c000007ff4ce0128 (&dev->mutex){....}-{4:4}, at: online_store+0x94/0x1a0
> > [64650.811929]  #5: c000000002913168 (cpu_add_remove_lock){+.+.}-{4:4}, at: cpu_up+0xa4/0x1b0
> > [64650.812061]  #6: c000000002913278 (cpu_hotplug_lock){++++}-{0:0}, at: _cpu_up+0x78/0x420
> 
> It's difficult to tell from the log but I wonder whether the problem is CPU
> hotplug operation needing to fork to complete (e.g. maybe it needs to
> schedule a work item and flush it?) while sched_ext enable path first blocks
> forks through scx_fork_rwsem and then does cpus_read_lock(). That'd create
> ABBA deadlock. It's a bit tricky as the existing code already makes it
> impossible to flip the nesting order.

Yes, it looks like it could be a ABBA deadlock, but I'm not able to get enough
details with lockdep and lockstat enabled. I'll try to collect more traces and
investigate further.

> 
> It could also be specific to ppc64 (e.g. there's something dependingon fork
> during CPU hotplug), or at least doesn't happen on x86. I'll probe it a bit
> more.

Sure, so far I haven't been able to reproduce the issue on x86, also I'm running
some tests for a longer duration to see if it can be recreated on x86 as well
inorder to rule out the possiblity of it being a ppc64 specific issue, it could
be that the issue occurs much rarer on x86 when compared to ppc64.


Thanks,

Aboorva