lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ZpXUg/HNSSX9ix/6@xpf.sh.intel.com>
Date: Tue, 16 Jul 2024 10:01:39 +0800
From: Pengfei Xu <pengfei.xu@...el.com>
To: Lai Jiangshan <jiangshanlai@...il.com>
CC: <linux-kernel@...r.kernel.org>, Lai Jiangshan
	<jiangshan.ljs@...group.com>, Tejun Heo <tj@...nel.org>,
	<syzkaller-bugs@...glegroups.com>
Subject: Re: [PATCH 3/7] workqueue: Remove cpus_read_lock() from
 apply_wqattrs_lock()

Hi Jiangshan and all,

Greetings!

I found the WARNING in alloc_workqueue in "next-20240715" tag by local
syzkaller:

Found the first bad commit:
19af45757383 workqueue: Remove cpus_read_lock() from apply_wqattrs_lock()
Reverted this commit and this issue was gone.

All detailed info: https://github.com/xupengfe/syzkaller_logs/tree/main/240715_174449_alloc_workqueue
Syzkaller reproduced code: https://github.com/xupengfe/syzkaller_logs/blob/main/240715_174449_alloc_workqueue/repro.c
Syzkaller repro syscall:https://github.com/xupengfe/syzkaller_logs/blob/main/240715_174449_alloc_workqueue/repro.prog
Mount image: https://github.com/xupengfe/syzkaller_logs/raw/main/240715_174449_alloc_workqueue/mount_0.gz
Kconfig(make olddefconfig):https://github.com/xupengfe/syzkaller_logs/blob/main/240715_174449_alloc_workqueue/kconfig_origin
Bisect info: https://github.com/xupengfe/syzkaller_logs/blob/main/240715_174449_alloc_workqueue/bisect_info.log
bzImage: https://github.com/xupengfe/syzkaller_logs/raw/main/240715_174449_alloc_workqueue/bzImage_91e3b24eb7d297d9d99030800ed96944b8652eaf.tar.gz
Issue dmesg: https://github.com/xupengfe/syzkaller_logs/blob/main/240715_174449_alloc_workqueue/91e3b24eb7d297d9d99030800ed96944b8652eaf_dmesg.log

"
[   30.328518] ------------[ cut here ]------------
[   30.329306] WARNING: CPU: 1 PID: 733 at kernel/cpu.c:527 lockdep_assert_cpus_held+0xd3/0x100
[   30.329339] Modules linked in:
[   30.329347] CPU: 1 UID: 0 PID: 733 Comm: repro Not tainted 6.10.0-next-20240715-91e3b24eb7d2-dirty #1
[   30.329364] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[   30.329372] RIP: 0010:lockdep_assert_cpus_held+0xd3/0x100
[   30.329391] Code: e8 02 cc 3e 00 be ff ff ff ff 48 c7 c7 d0 67 f0 86 e8 31 65 56 04 31 ff 89 c3 89 c6 e8 26 c7 3e 00 85 db 75 cc e8 dd cb 3e 00 <0f> 0b eb c3 48 c7 c7 44 ed c0 87 e8 bd 58 a4 00 e9 59 ff ff ff 48
[   30.329404] RSP: 0018:ffff8880142cf8d0 EFLAGS: 00010293
[   30.329415] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff81275c6a
[   30.329424] RDX: ffff888010a70000 RSI: ffffffff81275c73 RDI: 0000000000000005
[   30.329432] RBP: ffff8880142cf8d8 R08: 0000000000000000 R09: fffffbfff0f82dbc
[   30.329440] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88800e54f9d8
[   30.329448] R13: ffff88800e54f820 R14: ffff88801349fc00 R15: ffff88800e54f800
[   30.329456] FS:  00007fa859b71740(0000) GS:ffff88806c500000(0000) knlGS:0000000000000000
[   30.329467] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   30.329476] CR2: 00007fa85143f000 CR3: 000000001f226005 CR4: 0000000000770ef0
[   30.329488] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   30.329497] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400
[   30.329505] PKRU: 55555554
[   30.329509] Call Trace:
[   30.329513]  <TASK>
[   30.329520]  ? show_regs+0xa8/0xc0
[   30.329544]  ? __warn+0xf3/0x380
[   30.329562]  ? report_bug+0x25e/0x4b0
[   30.329590]  ? lockdep_assert_cpus_held+0xd3/0x100
[   30.329611]  ? report_bug+0x2cb/0x4b0
[   30.329627]  ? alloc_workqueue+0x920/0x1940
[   30.329645]  ? lockdep_assert_cpus_held+0xd3/0x100
[   30.329665]  ? handle_bug+0xa2/0x130
[   30.329689]  ? exc_invalid_op+0x3c/0x80
[   30.329713]  ? asm_exc_invalid_op+0x1f/0x30
[   30.329741]  ? lockdep_assert_cpus_held+0xca/0x100
[   30.329770]  ? lockdep_assert_cpus_held+0xd3/0x100
[   30.329790]  ? lockdep_assert_cpus_held+0xd3/0x100
[   30.329813]  alloc_workqueue+0x9b0/0x1940
[   30.329847]  ? __pfx_alloc_workqueue+0x10/0x10
[   30.329871]  ? __fget_files+0x253/0x4c0
[   30.329890]  ? __sanitizer_cov_trace_switch+0x58/0xa0
[   30.329922]  loop_configure+0xbb2/0x11f0
[   30.329960]  lo_ioctl+0x930/0x1aa0
[   30.329982]  ? __pfx_mark_lock.part.0+0x10/0x10
[   30.329997]  ? __lock_acquire+0xd87/0x5c90
[   30.330020]  ? __pfx_lo_ioctl+0x10/0x10
[   30.330053]  ? __pfx___lock_acquire+0x10/0x10
[   30.330074]  ? __kasan_check_read+0x15/0x20
[   30.330091]  ? __lock_acquire+0x1b0f/0x5c90
[   30.330116]  ? __sanitizer_cov_trace_switch+0x58/0xa0
[   30.330195]  ? __this_cpu_preempt_check+0x21/0x30
[   30.330216]  ? __pfx_lo_ioctl+0x10/0x10
[   30.330239]  blkdev_ioctl+0x2a9/0x6b0
[   30.330255]  ? __pfx_blkdev_ioctl+0x10/0x10
[   30.330271]  ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20
[   30.330290]  ? security_file_ioctl+0x9d/0xd0
[   30.330360]  ? __pfx_blkdev_ioctl+0x10/0x10
[   30.330376]  __x64_sys_ioctl+0x1b9/0x230
[   30.330397]  x64_sys_call+0x1227/0x2140
[   30.330414]  do_syscall_64+0x6d/0x140
[   30.330434]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   30.330450] RIP: 0033:0x7fa85983ec6b
[   30.330462] Code: 73 01 c3 48 8b 0d b5 b1 1b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 85 b1 1b 00 f7 d8 64 89 01 48
[   30.330474] RSP: 002b:00007fff65352a38 EFLAGS: 00000217 ORIG_RAX: 0000000000000010
[   30.330487] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fa85983ec6b
[   30.330495] RDX: 0000000000000003 RSI: 0000000000004c00 RDI: 0000000000000004
[   30.330503] RBP: 00007fff65352a70 R08: 00000000ffffffff R09: 0000000000000000
[   30.330511] R10: 0000000000000000 R11: 0000000000000217 R12: 00007fff65352dc8
[   30.330519] R13: 0000000000402cea R14: 0000000000404e08 R15: 00007fa859bbc000
[   30.330553]  </TASK>
[   30.330558] irq event stamp: 1577
[   30.330562] hardirqs last  enabled at (1579): [<ffffffff81457403>] vprintk_store+0x413/0xa90
[   30.330586] hardirqs last disabled at (1580): [<ffffffff8145778a>] vprintk_store+0x79a/0xa90
[   30.330608] softirqs last  enabled at (730): [<ffffffff81289719>] __irq_exit_rcu+0xa9/0x120
[   30.330624] softirqs last disabled at (715): [<ffffffff81289719>] __irq_exit_rcu+0xa9/0x120
[   30.330639] ---[ end trace 0000000000000000 ]---
"

I hope it's helpful.

Thank you!

---

If you don't need the following environment to reproduce the problem or if you
already have one reproduced environment, please ignore the following information.

How to reproduce:
git clone https://gitlab.com/xupengfe/repro_vm_env.git
cd repro_vm_env
tar -xvf repro_vm_env.tar.gz
cd repro_vm_env; ./start3.sh  // it needs qemu-system-x86_64 and I used v7.1.0
  // start3.sh will load bzImage_2241ab53cbb5cdb08a6b2d4688feb13971058f65 v6.2-rc5 kernel
  // You could change the bzImage_xxx as you want
  // Maybe you need to remove line "-drive if=pflash,format=raw,readonly=on,file=./OVMF_CODE.fd \" for different qemu version
You could use below command to log in, there is no password for root.
ssh -p 10023 root@...alhost

After login vm(virtual machine) successfully, you could transfer reproduced
binary to the vm by below way, and reproduce the problem in vm:
gcc -pthread -o repro repro.c
scp -P 10023 repro root@...alhost:/root/

Get the bzImage for target kernel:
Please use target kconfig and copy it to kernel_src/.config
make olddefconfig
make -jx bzImage           //x should equal or less than cpu num your pc has

Fill the bzImage file into above start3.sh to load the target kernel in vm.


Tips:
If you already have qemu-system-x86_64, please ignore below info.
If you want to install qemu v7.1.0 version:
git clone https://github.com/qemu/qemu.git
cd qemu
git checkout -f v7.1.0
mkdir build
cd build
yum install -y ninja-build.x86_64
yum -y install libslirp-devel.x86_64
../configure --target-list=x86_64-softmmu --enable-kvm --enable-vnc --enable-gtk --enable-sdl --enable-usb-redir --enable-slirp
make
make install

Best Regards,
Thanks!


On 2024-07-11 at 16:35:43 +0800, Lai Jiangshan wrote:
> From: Lai Jiangshan <jiangshan.ljs@...group.com>
> 
> The pwq creations and installations have been reworked based on
> wq_online_cpumask rather than cpu_online_mask.
> 
> So cpus_read_lock() is unneeded during wqattrs changes.
> 
> Signed-off-by: Lai Jiangshan <jiangshan.ljs@...group.com>
> ---
>  kernel/workqueue.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 9f454a9c04c8..64876d391e7c 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -5123,15 +5123,12 @@ static struct pool_workqueue *alloc_unbound_pwq(struct workqueue_struct *wq,
>  
>  static void apply_wqattrs_lock(void)
>  {
> -	/* CPUs should stay stable across pwq creations and installations */
> -	cpus_read_lock();
>  	mutex_lock(&wq_pool_mutex);
>  }
>  
>  static void apply_wqattrs_unlock(void)
>  {
>  	mutex_unlock(&wq_pool_mutex);
> -	cpus_read_unlock();
>  }
>  
>  /**
> -- 
> 2.19.1.6.gb485710b
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ