lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z6rgiVp7221r4JZ5@ly-workstation>
Date: Tue, 11 Feb 2025 13:30:49 +0800
From: "Lai, Yi" <yi1.lai@...ux.intel.com>
To: SeongJae Park <sj@...nel.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
	"Liam R. Howlett" <Liam.Howlett@...cle.com>,
	David Hildenbrand <david@...hat.com>,
	Davidlohr Bueso <dave@...olabs.net>,
	Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
	Shakeel Butt <shakeel.butt@...ux.dev>,
	Vlastimil Babka <vbabka@...e.cz>, linux-kernel@...r.kernel.org,
	linux-mm@...ck.org, yi1.lai@...el.com
Subject: Re: [PATCH 4/4] mm/madvise: remove redundant mmap_lock operations
 from process_madvise()

On Wed, Feb 05, 2025 at 10:15:17PM -0800, SeongJae Park wrote:
> Optimize redundant mmap lock operations from process_madvise() by
> directly doing the mmap locking first, and then the remaining works for
> all ranges in the loop.
> 
> Reviewed-by: Shakeel Butt <shakeel.butt@...ux.dev>
> Signed-off-by: SeongJae Park <sj@...nel.org>
> ---
>  mm/madvise.c | 26 ++++++++++++++++++++++++--
>  1 file changed, 24 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 31e5df75b926..5a0a1fc99d27 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -1754,9 +1754,26 @@ static ssize_t vector_madvise(struct mm_struct *mm, struct iov_iter *iter,
>  
>  	total_len = iov_iter_count(iter);
>  
> +	ret = madvise_lock(mm, behavior);
> +	if (ret)
> +		return ret;
> +
>  	while (iov_iter_count(iter)) {
> -		ret = do_madvise(mm, (unsigned long)iter_iov_addr(iter),
> -				 iter_iov_len(iter), behavior);
> +		unsigned long start = (unsigned long)iter_iov_addr(iter);
> +		size_t len_in = iter_iov_len(iter);
> +		size_t len;
> +
> +		if (!is_valid_madvise(start, len_in, behavior)) {
> +			ret = -EINVAL;
> +			break;
> +		}
> +
> +		len = PAGE_ALIGN(len_in);
> +		if (start + len == start)
> +			ret = 0;
> +		else
> +			ret = madvise_do_behavior(mm, start, len_in, len,
> +					behavior);
>  		/*
>  		 * An madvise operation is attempting to restart the syscall,
>  		 * but we cannot proceed as it would not be correct to repeat
> @@ -1772,12 +1789,17 @@ static ssize_t vector_madvise(struct mm_struct *mm, struct iov_iter *iter,
>  				ret = -EINTR;
>  				break;
>  			}
> +
> +			/* Drop and reacquire lock to unwind race. */
> +			madvise_unlock(mm, behavior);
> +			madvise_lock(mm, behavior);
>  			continue;
>  		}
>  		if (ret < 0)
>  			break;
>  		iov_iter_advance(iter, iter_iov_len(iter));
>  	}
> +	madvise_unlock(mm, behavior);
>  
>  	ret = (total_len - iov_iter_count(iter)) ? : ret;
> 

Hi SeongJae Park,

Greetings!

I used Syzkaller and found that there is WARNING in madvise_unlock in linux-next tag - next-20250210.

After bisection and the first bad commit is:
"
ec68fbd9e99f mm/madvise: remove redundant mmap_lock operations from process_madvise()
"

All detailed into can be found at:
https://github.com/laifryiee/syzkaller_logs/tree/main/250210_144836_madvise_unlock
Syzkaller repro code:
https://github.com/laifryiee/syzkaller_logs/tree/main/250210_144836_madvise_unlock/repro.c
Syzkaller repro syscall steps:
https://github.com/laifryiee/syzkaller_logs/tree/main/250210_144836_madvise_unlock/repro.prog
Syzkaller report:
https://github.com/laifryiee/syzkaller_logs/tree/main/250210_144836_madvise_unlock/repro.report
Kconfig(make olddefconfig):
https://github.com/laifryiee/syzkaller_logs/tree/main/250210_144836_madvise_unlock/kconfig_origin
Bisect info:
https://github.com/laifryiee/syzkaller_logs/tree/main/250210_144836_madvise_unlock/bisect_info.log
bzImage:
https://github.com/laifryiee/syzkaller_logs/raw/refs/heads/main/250210_144836_madvise_unlock/bzImage_df5d6180169ae06a2eac57e33b077ad6f6252440
Issue dmesg:
https://github.com/laifryiee/syzkaller_logs/blob/main/250210_144836_madvise_unlock/df5d6180169ae06a2eac57e33b077ad6f6252440_dmesg.log

"
[  135.191347] Injecting memory failure for pfn 0x8ea0 at process virtual address 0x20e7f000
[  135.194964] Memory failure: 0x8ea0: recovery action for reserved kernel page: Ignored
[  135.195584] ------------[ cut here ]------------
[  135.195863] WARNING: CPU: 1 PID: 680 at ./include/linux/rwsem.h:203 madvise_unlock+0x17e/0x1a0
[  135.196395] Modules linked in:
[  135.196612] CPU: 1 UID: 0 PID: 680 Comm: repro Not tainted 6.14.0-rc2-next-20250210-df5d6180169a #1
[  135.197135] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[  135.197818] RIP: 0010:madvise_unlock+0x17e/0x1a0
[  135.198108] Code: a1 9f ff 31 f6 49 8d bc 24 e0 01 00 00 e8 fa 80 d5 03 31 ff 89 c3 89 c6 e8 9f 9b 9f ff 85 db 0f 85 1c ff ff ffe
[  135.199154] RSP: 0018:ffff888022647e88 EFLAGS: 00010293
[  135.199468] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff81e878f1
[  135.199876] RDX: ffff88802264a540 RSI: ffffffff81e878fe RDI: 0000000000000005
[  135.200285] RBP: ffff888022647ea0 R08: 0000000000000001 R09: ffffed10044c8f25
[  135.200692] R10: 0000000000000000 R11: 0000000000000001 R12: ffff888021116180
[  135.201100] R13: ffff8880211162f0 R14: 0000000000004000 R15: ffff888021116180
[  135.201531] FS:  00007f0327a07600(0000) GS:ffff88806c500000(0000) knlGS:0000000000000000
[  135.201996] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  135.202329] CR2: 00007f032773e7f0 CR3: 0000000021bd4003 CR4: 0000000000770ef0
[  135.202737] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  135.203155] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400
[  135.203563] PKRU: 55555554
[  135.203731] Call Trace:
[  135.203883]  <TASK>
[  135.204020]  ? show_regs+0x6d/0x80
[  135.204240]  ? __warn+0xf3/0x390
[  135.204447]  ? report_bug+0x25e/0x4b0
[  135.204692]  ? madvise_unlock+0x17e/0x1a0
[  135.204937]  ? report_bug+0x2cb/0x4b0
[  135.205167]  ? madvise_unlock+0x17e/0x1a0
[  135.205413]  ? madvise_unlock+0x17f/0x1a0
[  135.205706]  ? handle_bug+0xf1/0x190
[  135.206188]  ? exc_invalid_op+0x3c/0x80
[  135.206423]  ? asm_exc_invalid_op+0x1f/0x30
[  135.206678]  ? madvise_unlock+0x171/0x1a0
[  135.206919]  ? madvise_unlock+0x17e/0x1a0
[  135.207156]  ? madvise_unlock+0x17e/0x1a0
[  135.207388]  ? madvise_unlock+0x17e/0x1a0
[  135.207623]  do_madvise+0x14f/0x1a0
[  135.207836]  __x64_sys_madvise+0xb2/0x120
[  135.208067]  ? syscall_trace_enter+0x14f/0x280
[  135.208328]  x64_sys_call+0x19b1/0x2150
[  135.208552]  do_syscall_64+0x6d/0x140
[  135.208766]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  135.209050] RIP: 0033:0x7f032763ee5d
[  135.209261] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b8
[  135.210261] RSP: 002b:00007ffcaafd26c8 EFLAGS: 00000207 ORIG_RAX: 000000000000001c
[  135.210678] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f032763ee5d
[  135.211069] RDX: 0000000000000064 RSI: 0000000000004000 RDI: 0000000020e7f000
[  135.211458] RBP: 00007ffcaafd26d0 R08: 00007ffcaafd2140 R09: 00007ffcaafd2700
[  135.211851] R10: 0000000000000000 R11: 0000000000000207 R12: 00007ffcaafd2828
[  135.212248] R13: 00000000004016da R14: 0000000000403e08 R15: 00007f0327a4e000
[  135.212656]  </TASK>
[  135.212794] irq event stamp: 807
[  135.212987] hardirqs last  enabled at (815): [<ffffffff81663b55>] __up_console_sem+0x95/0xb0
[  135.213477] hardirqs last disabled at (824): [<ffffffff81663b3a>] __up_console_sem+0x7a/0xb0
[  135.213951] softirqs last  enabled at (628): [<ffffffff8148c40e>] __irq_exit_rcu+0x10e/0x170
[  135.214426] softirqs last disabled at (623): [<ffffffff8148c40e>] __irq_exit_rcu+0x10e/0x170
[  135.214910] ---[ end trace 0000000000000000 ]---
[  135.215182]
[  135.215283] =====================================
[  135.215550] WARNING: bad unlock balance detected!
[  135.215817] 6.14.0-rc2-next-20250210-df5d6180169a #1 Tainted: G        W
[  135.216234] -------------------------------------
[  135.216502] repro/680 is trying to release lock (&mm->mmap_lock) at:
[  135.216863] [<ffffffff81e87854>] madvise_unlock+0xd4/0x1a0
[  135.217179] but there are no more locks to release!
[  135.217457]
[  135.217457] other info that might help us debug this:
[  135.217819] no locks held by repro/680.
[  135.218046]
[  135.218046] stack backtrace:
[  135.218296] CPU: 1 UID: 0 PID: 680 Comm: repro Tainted: G        W          6.14.0-rc2-next-20250210-df5d6180169a #1
[  135.218308] Tainted: [W]=WARN
[  135.218310] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[  135.218315] Call Trace:
[  135.218317]  <TASK>
[  135.218320]  dump_stack_lvl+0xea/0x150
[  135.218330]  ? madvise_unlock+0xd4/0x1a0
[  135.218341]  dump_stack+0x19/0x20
[  135.218349]  print_unlock_imbalance_bug+0x1b5/0x200
[  135.218368]  ? madvise_unlock+0xd4/0x1a0
[  135.218379]  lock_release+0x5bc/0x870
[  135.218386]  ? madvise_unlock+0x17f/0x1a0
[  135.218397]  ? handle_bug+0xf1/0x190
[  135.218407]  ? __pfx_lock_release+0x10/0x10
[  135.218415]  ? exc_invalid_op+0x3c/0x80
[  135.218426]  ? asm_exc_invalid_op+0x1f/0x30
[  135.218441]  up_write+0x31/0x550
[  135.218449]  ? madvise_unlock+0x17e/0x1a0
[  135.218462]  madvise_unlock+0xd4/0x1a0
[  135.218474]  do_madvise+0x14f/0x1a0
[  135.218487]  __x64_sys_madvise+0xb2/0x120
[  135.218500]  ? syscall_trace_enter+0x14f/0x280
[  135.218511]  x64_sys_call+0x19b1/0x2150
[  135.218522]  do_syscall_64+0x6d/0x140
[  135.218531]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  135.218542] RIP: 0033:0x7f032763ee5d
[  135.218548] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b8
[  135.218556] RSP: 002b:00007ffcaafd26c8 EFLAGS: 00000207 ORIG_RAX: 000000000000001c
[  135.218562] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f032763ee5d
[  135.218569] RDX: 0000000000000064 RSI: 0000000000004000 RDI: 0000000020e7f000
[  135.218573] RBP: 00007ffcaafd26d0 R08: 00007ffcaafd2140 R09: 00007ffcaafd2700
[  135.218578] R10: 0000000000000000 R11: 0000000000000207 R12: 00007ffcaafd2828
[  135.218582] R13: 00000000004016da R14: 0000000000403e08 R15: 00007f0327a4e000
[  135.218594]  </TASK>
[  135.228272] ------------[ cut here ]------------
[  135.228541] DEBUG_RWSEMS_WARN_ON((rwsem_owner(sem) != current) && !rwsem_test_oflags(sem, RWSEM_NONSPINNABLE)): count = 0x0, magy
[  135.229541] WARNING: CPU: 1 PID: 680 at kernel/locking/rwsem.c:1367 up_write+0x451/0x550
[  135.229997] Modules linked in:
[  135.230182] CPU: 1 UID: 0 PID: 680 Comm: repro Tainted: G        W          6.14.0-rc2-next-20250210-df5d6180169a #1
[  135.230758] Tainted: [W]=WARN
[  135.230940] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[  135.231565] RIP: 0010:up_write+0x451/0x550
[  135.231806] Code: ea 03 80 3c 02 00 0f 85 d5 00 00 00 49 8b 14 24 53 4d 89 f9 4c 89 f1 48 c7 c6 40 bd eb 85 48 c7 c7 60 bc eb 858
[  135.232814] RSP: 0018:ffff888022647e30 EFLAGS: 00010282
[  135.233113] RAX: 0000000000000000 RBX: ffffffff85ebbba0 RCX: ffffffff8146cfd3
[  135.233528] RDX: ffff88802264a540 RSI: ffffffff8146cfe0 RDI: 0000000000000001
[  135.233929] RBP: ffff888022647e78 R08: 0000000000000001 R09: ffffed10044c8f66
[  135.234325] R10: 0000000000000001 R11: 57525f4755424544 R12: ffff8880211162f0
[  135.234722] R13: ffff8880211162f8 R14: ffff8880211162f0 R15: ffff88802264a540
[  135.235122] FS:  00007f0327a07600(0000) GS:ffff88806c500000(0000) knlGS:0000000000000000
[  135.235584] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  135.235912] CR2: 00007f032773e7f0 CR3: 0000000021bd4003 CR4: 0000000000770ef0
[  135.236314] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  135.236713] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400
[  135.237114] PKRU: 55555554
[  135.237276] Call Trace:
[  135.237445]  <TASK>
[  135.237580]  ? show_regs+0x6d/0x80
[  135.237789]  ? __warn+0xf3/0x390
[  135.237987]  ? find_bug+0x310/0x490
[  135.238199]  ? up_write+0x451/0x550
[  135.238412]  ? report_bug+0x2cb/0x4b0
[  135.238636]  ? up_write+0x451/0x550
[  135.238853]  ? up_write+0x452/0x550
[  135.239065]  ? handle_bug+0xf1/0x190
[  135.239285]  ? exc_invalid_op+0x3c/0x80
[  135.239515]  ? asm_exc_invalid_op+0x1f/0x30
[  135.239768]  ? __warn_printk+0x173/0x2e0
[  135.240001]  ? __warn_printk+0x180/0x2e0
[  135.240235]  ? up_write+0x451/0x550
[  135.240445]  ? madvise_unlock+0x17e/0x1a0
[  135.240686]  madvise_unlock+0xd4/0x1a0
[  135.240913]  do_madvise+0x14f/0x1a0
[  135.241126]  __x64_sys_madvise+0xb2/0x120
[  135.241364]  ? syscall_trace_enter+0x14f/0x280
[  135.241647]  x64_sys_call+0x19b1/0x2150
[  135.241887]  do_syscall_64+0x6d/0x140
[  135.242113]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  135.242407] RIP: 0033:0x7f032763ee5d
[  135.242619] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b8
[  135.243646] RSP: 002b:00007ffcaafd26c8 EFLAGS: 00000207 ORIG_RAX: 000000000000001c
[  135.244075] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f032763ee5d
[  135.244476] RDX: 0000000000000064 RSI: 0000000000004000 RDI: 0000000020e7f000
[  135.244877] RBP: 00007ffcaafd26d0 R08: 00007ffcaafd2140 R09: 00007ffcaafd2700
[  135.245279] R10: 0000000000000000 R11: 0000000000000207 R12: 00007ffcaafd2828
[  135.245696] R13: 00000000004016da R14: 0000000000403e08 R15: 00007f0327a4e000
[  135.246106]  </TASK>
[  135.246242] irq event stamp: 857
[  135.246434] hardirqs last  enabled at (857): [<ffffffff81663b55>] __up_console_sem+0x95/0xb0
[  135.246915] hardirqs last disabled at (856): [<ffffffff81663b3a>] __up_console_sem+0x7a/0xb0
[  135.247392] softirqs last  enabled at (628): [<ffffffff8148c40e>] __irq_exit_rcu+0x10e/0x170
[  135.247871] softirqs last disabled at (623): [<ffffffff8148c40e>] __irq_exit_rcu+0x10e/0x170
[  135.248347] ---[ end trace 0000000000000000 ]---
"

Hope this cound be insightful to you.

Regards,
Yi Lai

---

If you don't need the following environment to reproduce the problem or if you
already have one reproduced environment, please ignore the following information.

How to reproduce:
git clone https://gitlab.com/xupengfe/repro_vm_env.git
cd repro_vm_env
tar -xvf repro_vm_env.tar.gz
cd repro_vm_env; ./start3.sh  // it needs qemu-system-x86_64 and I used v7.1.0
  // start3.sh will load bzImage_2241ab53cbb5cdb08a6b2d4688feb13971058f65 v6.2-rc5 kernel
  // You could change the bzImage_xxx as you want
  // Maybe you need to remove line "-drive if=pflash,format=raw,readonly=on,file=./OVMF_CODE.fd \" for different qemu version
You could use below command to log in, there is no password for root.
ssh -p 10023 root@...alhost

After login vm(virtual machine) successfully, you could transfer reproduced
binary to the vm by below way, and reproduce the problem in vm:
gcc -pthread -o repro repro.c
scp -P 10023 repro root@...alhost:/root/

Get the bzImage for target kernel:
Please use target kconfig and copy it to kernel_src/.config
make olddefconfig
make -jx bzImage           //x should equal or less than cpu num your pc has

Fill the bzImage file into above start3.sh to load the target kernel in vm.


Tips:
If you already have qemu-system-x86_64, please ignore below info.
If you want to install qemu v7.1.0 version:
git clone https://github.com/qemu/qemu.git
cd qemu
git checkout -f v7.1.0
mkdir build
cd build
yum install -y ninja-build.x86_64
yum -y install libslirp-devel.x86_64
../configure --target-list=x86_64-softmmu --enable-kvm --enable-vnc --enable-gtk --enable-sdl --enable-usb-redir --enable-slirp
make
make install 

> -- 
> 2.39.5

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ