lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aAvnGKR7ThrRiRQP@slm.duckdns.org>
Date: Fri, 25 Apr 2025 09:48:40 -1000
From: Tejun Heo <tj@...nel.org>
To: Andrea Righi <arighi@...dia.com>
Cc: David Vernet <void@...ifault.com>, Changwoo Min <changwoo@...lia.com>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 03/12] sched_ext: Use dynamic allocation for scx_sched

On Fri, Apr 25, 2025 at 12:14:13PM +0200, Andrea Righi wrote:
...
> >  static void bpf_scx_unreg(void *kdata, struct bpf_link *link)
> >  {
> > +	struct scx_sched *sch = scx_root;
> > +
> >  	scx_disable(SCX_EXIT_UNREG);
> >  	kthread_flush_work(&scx_disable_work);
> > +	kobject_put(&sch->kobj);
> >  }
> 
> We probably need to check sch != NULL here, I was able to trigger this bug
> (using a buggy rustland):
> 
> [    5.048913] sched_ext: rustland: invalid CPU -16 from ops.select_cpu()
> [    5.048984]    ops_cpu_valid+0x4a/0x60
> [    5.049039]    select_task_rq_scx+0x10f/0x200
> [    5.049100]    try_to_wake_up+0x17a/0x890
> [    5.049149]    ep_autoremove_wake_function+0x12/0x40
> [    5.049211]    __wake_up_common+0x7f/0xc0
> [    5.049259]    __wake_up+0x36/0x60
> [    5.049306]    ep_poll_callback+0x265/0x320
> [    5.049354]    __wake_up_common+0x7f/0xc0
> [    5.049401]    __wake_up+0x36/0x60
> [    5.049448]    __send_signal_locked+0x71e/0x740
> [    5.049508]    group_send_sig_info+0xf3/0x1b0
> [    5.049567]    kill_pid_info_type+0x79/0x1a0
> [    5.049627]    kill_proc_info+0x5d/0x110
> [    5.049674]    __x64_sys_kill+0x91/0xc0
> [    5.049789]    do_syscall_64+0xbb/0x1d0
> [    5.049855]    entry_SYSCALL_64_after_hwframe+0x77/0x7f
> [    5.050315] BUG: kernel NULL pointer dereference, address: 00000000000003b0
> [    5.050386] #PF: supervisor read access in kernel mode
> [    5.050439] #PF: error_code(0x0000) - not-present page
> [    5.050488] PGD 0 P4D 0
> [    5.050523] Oops: Oops: 0000 [#1] SMP NOPTI
> [    5.050571] CPU: 5 UID: 0 PID: 284 Comm: scx_rustland Not tainted 6.14.0-virtme #27 PREEMPT(full)
> [    5.050670] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
> [    5.050782] RIP: 0010:kthread_flush_work+0x60/0x140
> [    5.050847] Code: 80 00 00 00 31 c0 48 8d 7c 24 18 48 89 24 24 48 89 64 24 08 48 c7 44 24 10 30 e6 58 b0 f3 48 ab 48 8d 7c 24 30 e8 40 6c 05 00 <48> 8b 6b 18 48 85 ed 74 69 4c 8d 65 08 4c 89 e7 e8 0b 0c d3 00 48
> [    5.051021] RSP: 0018:ffffad01816f7e08 EFLAGS: 00010246
> [    5.051066] RAX: 0000000000000000 RBX: 0000000000000398 RCX: 0000000000000000
> [    5.051131] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffffb2f34d80
> [    5.051196] RBP: ffff9b140268e800 R08: 0000000000000002 R09: 0000000000000000
> [    5.051260] R10: 0000000000000001 R11: 0000000000000000 R12: ffff9b14804c1ea0
> [    5.051325] R13: ffff9b1401b6cc20 R14: ffff9b1403965728 R15: 0000000000000000
> [    5.051393] FS:  00007f1ff0550800(0000) GS:ffff9b14cbdb3000(0000) knlGS:0000000000000000
> [    5.051463] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    5.051516] CR2: 00000000000003b0 CR3: 000000000553e002 CR4: 0000000000772ef0
> [    5.051582] PKRU: 55555554
> [    5.051606] Call Trace:
> [    5.051634]  <TASK>
> [    5.051665]  ? __pfx_kthread_flush_work_fn+0x10/0x10
> [    5.051726]  bpf_scx_unreg+0x27/0x40
> [    5.051773]  bpf_struct_ops_map_link_dealloc+0x36/0x50
> [    5.051824]  bpf_link_release+0x18/0x20
> [    5.051863]  __fput+0xf8/0x2c0
> [    5.051905]  __x64_sys_close+0x3d/0x80
> [    5.051943]  do_syscall_64+0xbb/0x1d0
> [    5.051983]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> 
> Changing bpf_scx_unreg() as following fixed the bug for me:
> 
> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> index d963aa5c99e1a..0e52a8dbd593e 100644
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -5752,11 +5752,17 @@ static int bpf_scx_reg(void *kdata, struct bpf_link *link)
>  
>  static void bpf_scx_unreg(void *kdata, struct bpf_link *link)
>  {
> -	struct scx_sched *sch = scx_root;
> +	struct scx_sched *sch;
>  
>  	scx_disable(SCX_EXIT_UNREG);
> -	kthread_flush_work(&sch->disable_work);
> -	kobject_put(&sch->kobj);
> +
> +	rcu_read_lock();
> +	sch = rcu_dereference(scx_root);
> +	if (sch) {
> +		kthread_flush_work(&sch->disable_work);
> +		kobject_put(&sch->kobj);
> +	}
> +	rcu_read_unlock();
>  }

Oh I didn't expect that. As scx_root can only be written by the preceding
bpf_scx_reg(), I don't think we need rcu_read_lock() here as subtle as that
may be. I'll update the code.

Thanks.

-- 
tejun

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ