linux-kernel - Re: [PATCH v2] sched_ext: add a missing rcu_read_lock/unlock pair at scx_select_cpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zy9n1qhuraDfHV7e@gpd3>
Date: Sat, 9 Nov 2024 14:47:02 +0100
From: Andrea Righi <arighi@...dia.com>
To: Changwoo Min <multics69@...il.com>
Cc: tj@...nel.org, void@...ifault.com, changwoo@...lia.com,
	kernel-dev@...lia.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] sched_ext: add a missing rcu_read_lock/unlock pair at
 scx_select_cpu_dfl()

Hi Changwoo,

On Sat, Nov 09, 2024 at 03:29:05PM +0900, Changwoo Min wrote:
> When getting an LLC CPU mask in the default CPU selection policy,
> scx_select_cpu_dfl(), a pointer to the sched_domain is dereferenced
> using rcu_read_lock() without holding rcu_read_lock(). Such an unprotected
> dereference often causes the following warning and can cause an invalid
> memory access in the worst case.

Thanks for catching this! I'm wondering why we didn't hit this in our
CI... I'll investigate.

Just a minor nit below, apart than that looks good to me.

Acked-by: Andrea Righi <arighi@...dia.com>

> 
> Therefore, protect dereference of a sched_domain pointer using a pair
> of rcu_read_lock() and unlock().
> 
> [   20.996135] =============================
> [   20.996345] WARNING: suspicious RCU usage
> [   20.996563] 6.11.0-virtme #17 Tainted: G        W
> [   20.996576] -----------------------------
> [   20.996576] kernel/sched/ext.c:3323 suspicious rcu_dereference_check() usage!
> [   20.996576]
> [   20.996576] other info that might help us debug this:
> [   20.996576]
> [   20.996576]
> [   20.996576] rcu_scheduler_active = 2, debug_locks = 1
> [   20.996576] 4 locks held by kworker/8:1/140:
> [   20.996576]  #0: ffff8b18c00dd348 ((wq_completion)pm){+.+.}-{0:0}, at: process_one_work+0x4a0/0x590
> [   20.996576]  #1: ffffb3da01f67e58 ((work_completion)(&dev->power.work)){+.+.}-{0:0}, at: process_one_work+0x1ba/0x590
> [   20.996576]  #2: ffffffffa316f9f0 (&rcu_state.gp_wq){..-.}-{2:2}, at: swake_up_one+0x15/0x60
> [   20.996576]  #3: ffff8b1880398a60 (&p->pi_lock){-.-.}-{2:2}, at: try_to_wake_up+0x59/0x7d0
> [   20.996576]
> [   20.996576] stack backtrace:
> [   20.996576] CPU: 8 UID: 0 PID: 140 Comm: kworker/8:1 Tainted: G        W          6.11.0-virtme #17
> [   20.996576] Tainted: [W]=WARN
> [   20.996576] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
> [   20.996576] Workqueue: pm pm_runtime_work
> [   20.996576] Sched_ext: simple (disabling+all), task: runnable_at=-6ms
> [   20.996576] Call Trace:
> [   20.996576]  <IRQ>
> [   20.996576]  dump_stack_lvl+0x6f/0xb0
> [   20.996576]  lockdep_rcu_suspicious.cold+0x4e/0x96
> [   20.996576]  scx_select_cpu_dfl+0x234/0x260
> [   20.996576]  select_task_rq_scx+0xfb/0x190
> [   20.996576]  select_task_rq+0x47/0x110
> [   20.996576]  try_to_wake_up+0x110/0x7d0
> [   20.996576]  swake_up_one+0x39/0x60
> [   20.996576]  rcu_core+0xb08/0xe50
> [   20.996576]  ? srso_alias_return_thunk+0x5/0xfbef5
> [   20.996576]  ? mark_held_locks+0x40/0x70
> [   20.996576]  handle_softirqs+0xd3/0x410
> [   20.996576]  irq_exit_rcu+0x78/0xa0
> [   20.996576]  sysvec_apic_timer_interrupt+0x73/0x80
> [   20.996576]  </IRQ>
> [   20.996576]  <TASK>
> [   20.996576]  asm_sysvec_apic_timer_interrupt+0x1a/0x20
> [   20.996576] RIP: 0010:_raw_spin_unlock_irqrestore+0x36/0x70
> [   20.996576] Code: f5 53 48 8b 74 24 10 48 89 fb 48 83 c7 18 e8 11 b4 36 ff 48 89 df e8 99 0d 37 ff f7 c5 00 02 00 00 75 17 9c 58 f6 c4 02 75 2b <65> ff 0d 5b 55 3c 5e 74 16 5b 5d e9 95 8e 28 00 e8 a5 ee 44 ff 9c
> [   20.996576] RSP: 0018:ffffb3da01f67d20 EFLAGS: 00000246
> [   20.996576] RAX: 0000000000000002 RBX: ffffffffa4640220 RCX: 0000000000000040
> [   20.996576] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffa1c7b27b
> [   20.996576] RBP: 0000000000000246 R08: 0000000000000001 R09: 0000000000000000
> [   20.996576] R10: 0000000000000001 R11: 000000000000021c R12: 0000000000000246
> [   20.996576] R13: ffff8b1881363958 R14: 0000000000000000 R15: ffff8b1881363800
> [   20.996576]  ? _raw_spin_unlock_irqrestore+0x4b/0x70
> [   20.996576]  serial_port_runtime_resume+0xd4/0x1a0
> [   20.996576]  ? __pfx_serial_port_runtime_resume+0x10/0x10
> [   20.996576]  __rpm_callback+0x44/0x170
> [   20.996576]  ? __pfx_serial_port_runtime_resume+0x10/0x10
> [   20.996576]  rpm_callback+0x55/0x60
> [   20.996576]  ? __pfx_serial_port_runtime_resume+0x10/0x10
> [   20.996576]  rpm_resume+0x582/0x7b0
> [   20.996576]  pm_runtime_work+0x7c/0xb0
> [   20.996576]  process_one_work+0x1fb/0x590
> [   20.996576]  worker_thread+0x18e/0x350
> [   20.996576]  ? __pfx_worker_thread+0x10/0x10
> [   20.996576]  kthread+0xe2/0x110
> [   20.996576]  ? __pfx_kthread+0x10/0x10
> [   20.996576]  ret_from_fork+0x34/0x50
> [   20.996576]  ? __pfx_kthread+0x10/0x10
> [   20.996576]  ret_from_fork_asm+0x1a/0x30
> [   20.996576]  </TASK>
> [   21.056592] sched_ext: BPF scheduler "simple" disabled (unregistered from user space)
> 
> Signed-off-by: Changwoo Min <changwoo@...lia.com>
> ---
>  kernel/sched/ext.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> ChangeLog v1 -> v2:
>   - Extend the RCU critical section to cover the use of llc_cpus.
> 
> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> index 6ff501a18b88..f0e9c77eb33b 100644
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -3302,6 +3302,12 @@ static s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu,
>  
>  	*found = false;
>  
> +

^ Can we remove this empty line?

> +	/*
> +	 * This is necessary to protect llc_cpus.
> +	 */
> +	rcu_read_lock();
> +
>  	/*
>  	 * Determine the scheduling domain only if the task is allowed to run
>  	 * on all CPUs.
> @@ -3436,9 +3442,12 @@ static s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu,
>  	if (cpu >= 0)
>  		goto cpu_found;
>  
> +	rcu_read_unlock();
>  	return prev_cpu;
>  
>  cpu_found:
> +	rcu_read_unlock();
> +
>  	*found = true;
>  	return cpu;
>  }
> -- 
> 2.47.0
>