lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Z/9aNcsuz0VlvgYz@ly-workstation>
Date: Wed, 16 Apr 2025 15:20:21 +0800
From: "Lai, Yi" <yi1.lai@...ux.intel.com>
To: Zhang Yi <yi.zhang@...weicloud.com>
Cc: linux-xfs@...r.kernel.org, linux-fsdevel@...r.kernel.org,
	linux-kernel@...r.kernel.org, djwong@...nel.org, hch@...radead.org,
	brauner@...nel.org, david@...morbit.com, chandanbabu@...nel.org,
	tytso@....edu, jack@...e.cz, yi.zhang@...wei.com,
	chengzhihao1@...wei.com, yukuai3@...wei.com, yi1.lai@...el.com
Subject: Re: [PATCH v5 4/9] xfs: convert delayed extents to unwritten when
 zeroing post eof blocks

Hi Zhang Yi,

Greetings!

I used Syzkaller and found that there is WARNING in xfs_bmapi_convert_one_delalloc in linux v6.15-rc2.

After bisection and the first bad commit is:
"
5ce5674187c3 xfs: convert delayed extents to unwritten when zeroing post eof blocks
"

All detailed into can be found at:
https://github.com/laifryiee/syzkaller_logs/tree/main/250413_025108_xfs_bmapi_convert_one_delalloc
Syzkaller repro code:
https://github.com/laifryiee/syzkaller_logs/tree/main/250413_025108_xfs_bmapi_convert_one_delalloc/repro.c
Syzkaller repro syscall steps:
https://github.com/laifryiee/syzkaller_logs/tree/main/250413_025108_xfs_bmapi_convert_one_delalloc/repro.prog
Syzkaller report:
https://github.com/laifryiee/syzkaller_logs/tree/main/250413_025108_xfs_bmapi_convert_one_delalloc/repro.report
Kconfig(make olddefconfig):
https://github.com/laifryiee/syzkaller_logs/tree/main/250413_025108_xfs_bmapi_convert_one_delalloc/kconfig_origin
Bisect info:
https://github.com/laifryiee/syzkaller_logs/tree/main/250413_025108_xfs_bmapi_convert_one_delalloc/bisect_info.log
bzImage:
https://github.com/laifryiee/syzkaller_logs/raw/refs/heads/main/250413_025108_xfs_bmapi_convert_one_delalloc/bzImage_0af2f6be1b4281385b618cb86ad946eded089ac8
Issue dmesg:
https://github.com/laifryiee/syzkaller_logs/blob/main/250413_025108_xfs_bmapi_convert_one_delalloc/0af2f6be1b4281385b618cb86ad946eded089ac8_dmesg.log

"
[   21.631771] ------------[ cut here ]------------
[   21.632173] WARNING: CPU: 1 PID: 760 at fs/xfs/libxfs/xfs_bmap.c:4401 xfs_bmapi_convert_one_delalloc+0x520/0xca0
[   21.633001] Modules linked in:
[   21.633292] CPU: 1 UID: 0 PID: 760 Comm: repro Tainted: G        W           6.15.0-rc2-v6.15-rc2 #1 PREEMPT(voluntary) 
[   21.634151] Tainted: [W]=WARN
[   21.634401] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[   21.635476] RIP: 0010:xfs_bmapi_convert_one_delalloc+0x520/0xca0
[   21.635970] Code: fc ff ff e8 32 76 eb fe 44 89 fe bf 02 00 00 00 41 bc f5 ff ff ff e8 cf 70 eb fe 41 83 ff 02 0f 84 03 ff ff ff e8 10 76 eb fe <0f> 0b e9 f7 fe ff ff e8 04 76 eb fe 48 c7 c0 c0 ca ff 89 48 ba 00
[   21.637426] RSP: 0018:ff1100001cc37228 EFLAGS: 00010293
[   21.637855] RAX: 0000000000000000 RBX: ff1100001cc373e8 RCX: ffffffff829c44b1
[   21.638430] RDX: ff110000108b2bc0 RSI: ffffffff829c44c0 RDI: 0000000000000005
[   21.639001] RBP: ff1100001cc37410 R08: ff1100001cc37318 R09: fffffbfff0fdc374
[   21.639614] R10: 0000000000000000 R11: ff110000108b3a18 R12: 00000000fffffff5
[   21.640180] R13: ff1100001d471680 R14: ff1100000ec74000 R15: 0000000000000000
[   21.640751] FS:  00007f4c1b99a640(0000) GS:ff110000e3a84000(0000) knlGS:0000000000000000
[   21.641394] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   21.641862] CR2: 00007f4c14cfd000 CR3: 000000000d4f4006 CR4: 0000000000771ef0
[   21.642439] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   21.643007] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[   21.643620] PKRU: 55555554
[   21.643848] Call Trace:
[   21.644060]  <TASK>
[   21.644256]  ? __pfx_xfs_bmapi_convert_one_delalloc+0x10/0x10
[   21.644738]  ? __up_read+0x21e/0x810
[   21.645063]  ? unmap_mapping_range+0x118/0x2c0
[   21.645436]  ? lock_release+0x14f/0x2c0
[   21.645761]  ? __pfx_unmap_mapping_range+0x10/0x10
[   21.646164]  xfs_bmapi_convert_delalloc+0xb0/0x100
[   21.646568]  xfs_buffered_write_iomap_begin+0x153d/0x1890
[   21.647022]  ? __pfx_xfs_buffered_write_iomap_begin+0x10/0x10
[   21.647535]  ? filemap_range_has_writeback+0x318/0x460
[   21.647977]  ? filemap_range_has_writeback+0xf1/0x460
[   21.648395]  ? __pfx_xfs_buffered_write_iomap_begin+0x10/0x10
[   21.648870]  iomap_iter+0x58b/0xd30
[   21.649178]  iomap_zero_range+0x547/0x740
[   21.649526]  ? __pfx_iomap_zero_range+0x10/0x10
[   21.649926]  ? __this_cpu_preempt_check+0x21/0x30
[   21.650317]  ? lock_is_held_type+0xef/0x150
[   21.650673]  xfs_zero_range+0xaf/0x100
[   21.650990]  xfs_file_write_checks+0x639/0xa00
[   21.651385]  xfs_file_buffered_write+0x183/0x9a0
[   21.651806]  ? __pfx_xfs_file_buffered_write+0x10/0x10
[   21.652225]  ? __lock_acquire+0x410/0x2260
[   21.652580]  ? __lruvec_stat_mod_folio+0x1aa/0x3e0
[   21.652985]  xfs_file_write_iter+0x52a/0xc00
[   21.653350]  do_iter_readv_writev+0x6b1/0x9c0
[   21.653723]  ? __pfx_do_iter_readv_writev+0x10/0x10
[   21.654127]  ? __this_cpu_preempt_check+0x21/0x30
[   21.654526]  ? lock_is_held_type+0xef/0x150
[   21.654879]  vfs_writev+0x311/0xd70
[   21.655204]  ? __lock_acquire+0x410/0x2260
[   21.655574]  ? __pfx_vfs_writev+0x10/0x10
[   21.655922]  ? __fget_files+0x1fa/0x3b0
[   21.656251]  ? __this_cpu_preempt_check+0x21/0x30
[   21.656643]  ? lock_release+0x14f/0x2c0
[   21.656973]  ? __fget_files+0x204/0x3b0
[   21.657306]  do_pwritev+0x1c9/0x280
[   21.657612]  ? do_pwritev+0x1c9/0x280
[   21.657927]  ? __pfx_do_pwritev+0x10/0x10
[   21.658269]  ? __audit_syscall_entry+0x39c/0x500
[   21.658661]  __x64_sys_pwritev+0xa3/0x100
[   21.659000]  ? syscall_trace_enter+0x14d/0x280
[   21.659387]  x64_sys_call+0x1bbc/0x2150
[   21.659741]  do_syscall_64+0x6d/0x150
[   21.660055]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   21.660473] RIP: 0033:0x7f4c1b63ee5d
[   21.660774] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 93 af 1b 00 f7 d8 64 89 01 48
[   21.662222] RSP: 002b:00007f4c1b999df8 EFLAGS: 00000202 ORIG_RAX: 0000000000000128
[   21.662830] RAX: ffffffffffffffda RBX: 00007f4c1b99a640 RCX: 00007f4c1b63ee5d
[   21.663432] RDX: 0000000000000001 RSI: 0000000020000300 RDI: 0000000000000004
[   21.664001] RBP: 00007f4c1b999e20 R08: 0000000000000000 R09: 0000000000000000
[   21.664569] R10: 0000000002fffffd R11: 0000000000000202 R12: 00007f4c1b99a640
[   21.665137] R13: 000000000000006e R14: 00007f4c1b69f560 R15: 0000000000000000
[   21.665726]  </TASK>
"

Hope this cound be insightful to you.

Regards,
Yi Lai

---

If you don't need the following environment to reproduce the problem or if you
already have one reproduced environment, please ignore the following information.

How to reproduce:
git clone https://gitlab.com/xupengfe/repro_vm_env.git
cd repro_vm_env
tar -xvf repro_vm_env.tar.gz
cd repro_vm_env; ./start3.sh  // it needs qemu-system-x86_64 and I used v7.1.0
  // start3.sh will load bzImage_2241ab53cbb5cdb08a6b2d4688feb13971058f65 v6.2-rc5 kernel
  // You could change the bzImage_xxx as you want
  // Maybe you need to remove line "-drive if=pflash,format=raw,readonly=on,file=./OVMF_CODE.fd \" for different qemu version
You could use below command to log in, there is no password for root.
ssh -p 10023 root@...alhost

After login vm(virtual machine) successfully, you could transfer reproduced
binary to the vm by below way, and reproduce the problem in vm:
gcc -pthread -o repro repro.c
scp -P 10023 repro root@...alhost:/root/

Get the bzImage for target kernel:
Please use target kconfig and copy it to kernel_src/.config
make olddefconfig
make -jx bzImage           //x should equal or less than cpu num your pc has

Fill the bzImage file into above start3.sh to load the target kernel in vm.


Tips:
If you already have qemu-system-x86_64, please ignore below info.
If you want to install qemu v7.1.0 version:
git clone https://github.com/qemu/qemu.git
cd qemu
git checkout -f v7.1.0
mkdir build
cd build
yum install -y ninja-build.x86_64
yum -y install libslirp-devel.x86_64
../configure --target-list=x86_64-softmmu --enable-kvm --enable-vnc --enable-gtk --enable-sdl --enable-usb-redir --enable-slirp
make
make install 

On Thu, Apr 25, 2024 at 09:13:30PM +0800, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@...wei.com>
> 
> Current clone operation could be non-atomic if the destination of a file
> is beyond EOF, user could get a file with corrupted (zeroed) data on
> crash.
> 
> The problem is about preallocations. If you write some data into a file:
> 
> 	[A...B)
> 
> and XFS decides to preallocate some post-eof blocks, then it can create
> a delayed allocation reservation:
> 
> 	[A.........D)
> 
> The writeback path tries to convert delayed extents to real ones by
> allocating blocks. If there aren't enough contiguous free space, we can
> end up with two extents, the first real and the second still delalloc:
> 
> 	[A....C)[C.D)
> 
> After that, both the in-memory and the on-disk file sizes are still B.
> If we clone into the range [E...F) from another file:
> 
> 	[A....C)[C.D)      [E...F)
> 
> then xfs_reflink_zero_posteof() calls iomap_zero_range() to zero out the
> range [B, E) beyond EOF and flush it. Since [C, D) is still a delalloc
> extent, its pagecache will be zeroed and both the in-memory and on-disk
> size will be updated to D after flushing but before cloning. This is
> wrong, because the user can see the size change and read the zeroes
> while the clone operation is ongoing.
> 
> We need to keep the in-memory and on-disk size before the clone
> operation starts, so instead of writing zeroes through the page cache
> for delayed ranges beyond EOF, we convert these ranges to unwritten and
> invalidate any cached data over that range beyond EOF.
> 
> Suggested-by: Dave Chinner <david@...morbit.com>
> Signed-off-by: Zhang Yi <yi.zhang@...wei.com>
> ---
> Changes since v4:
> 
> Move the delalloc converting hunk before searching the COW fork. Because
> if the file has been reflinked and copied on write,
> xfs_bmap_extsize_align() aligned the range of COW delalloc extent, after
> the writeback, there might be some unwritten extents left over in the
> COW fork that overlaps the delalloc extent we found in data fork.
> 
>   data fork  ...wwww|dddddddddd...
>   cow fork          |uuuuuuuuuu...
>                     ^
>                   i_size
> 
> In my v4, we search the COW fork before checking the delalloc extent,
> goto found_cow tag and return unconverted delalloc srcmap in the above
> case, so the delayed extent in the data fork will have no chance to
> convert to unwritten, it will lead to delalloc extent residue and break
> generic/522 after merging patch 6.
> 
>  fs/xfs/xfs_iomap.c | 29 +++++++++++++++++++++++++++++
>  1 file changed, 29 insertions(+)
> 
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 236ee78aa75b..2857ef1b0272 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -1022,6 +1022,24 @@ xfs_buffered_write_iomap_begin(
>  		goto out_unlock;
>  	}
>  
> +	/*
> +	 * For zeroing, trim a delalloc extent that extends beyond the EOF
> +	 * block.  If it starts beyond the EOF block, convert it to an
> +	 * unwritten extent.
> +	 */
> +	if ((flags & IOMAP_ZERO) && imap.br_startoff <= offset_fsb &&
> +	    isnullstartblock(imap.br_startblock)) {
> +		xfs_fileoff_t eof_fsb = XFS_B_TO_FSB(mp, XFS_ISIZE(ip));
> +
> +		if (offset_fsb >= eof_fsb)
> +			goto convert_delay;
> +		if (end_fsb > eof_fsb) {
> +			end_fsb = eof_fsb;
> +			xfs_trim_extent(&imap, offset_fsb,
> +					end_fsb - offset_fsb);
> +		}
> +	}
> +
>  	/*
>  	 * Search the COW fork extent list even if we did not find a data fork
>  	 * extent.  This serves two purposes: first this implements the
> @@ -1167,6 +1185,17 @@ xfs_buffered_write_iomap_begin(
>  	xfs_iunlock(ip, lockmode);
>  	return xfs_bmbt_to_iomap(ip, iomap, &imap, flags, 0, seq);
>  
> +convert_delay:
> +	xfs_iunlock(ip, lockmode);
> +	truncate_pagecache(inode, offset);
> +	error = xfs_bmapi_convert_delalloc(ip, XFS_DATA_FORK, offset,
> +					   iomap, NULL);
> +	if (error)
> +		return error;
> +
> +	trace_xfs_iomap_alloc(ip, offset, count, XFS_DATA_FORK, &imap);
> +	return 0;
> +
>  found_cow:
>  	seq = xfs_iomap_inode_sequence(ip, 0);
>  	if (imap.br_startoff <= offset_fsb) {
> -- 
> 2.39.2
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ