lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <fff990bc-6ff9-7b34-8e80-57078de40928@redhat.com>
Date: Mon, 17 Nov 2025 21:48:49 +0100 (CET)
From: Mikulas Patocka <mpatocka@...hat.com>
To: "Uladzislau Rezki (Sony)" <urezki@...il.com>
cc: Alasdair Kergon <agk@...hat.com>, DMML <dm-devel@...ts.linux.dev>, 
    Andrew Morton <akpm@...ux-foundation.org>, 
    Mike Snitzer <snitzer@...hat.com>, Christoph Hellwig <hch@....de>, 
    LKML <linux-kernel@...r.kernel.org>
Subject: Re: [RESEND PATCH] dm-ebs: Mark full buffer dirty even on partial
 write

Hi

What is the logical_block_size of the underlying nvme device? - i.e. 
what's the content of this file 
/sys/block/nvme0n1/queue/logical_block_size in the virtual machine?

Mikulas

On Mon, 17 Nov 2025, Uladzislau Rezki (Sony) wrote:

> When performing a read-modify-write(RMW) operation, any modification
> to a buffered block must cause the entire buffer to be marked dirty.
> 
> Marking only a subrange as dirty is incorrect because the underlying
> device block size(ubs) defines the minimum read/write granularity. A
> lower device can perform I/O only on regions which are fully aligned
> and sized to ubs.
> 
> This change ensures that write-back operations always occur in full
> ubs-sized chunks, matching the intended emulation semantics of the
> EBS target.
> 
> As for user space visible impact, submitting sub-ubs and misaligned
> I/O for devices which are tuned to ubs sizes only, will reject such
> requests, therefore it can lead to losing data. Example:
> 
> 1) Create a 8K nvme device in qemu by adding
> 
> -device nvme,drive=drv0,serial=foo,logical_block_size=8192,physical_block_size=8192
> 
> 2) Setup dm-ebs to emulate 512B to 8K mapping
> 
> urezki@...38:~/bin$ cat dmsetup.sh
> 
> lower=/dev/nvme0n1
> len=$(blockdev --getsz "$lower")
> 
> echo "0 $len ebs $lower 0 1 16" | dmsetup create nvme-8k
> urezki@...38:~/bin$
> 
> offset 0, ebs=1 and ubs=16(in sectors).
> 
> 3) Create an ext4 filesystem(default 4K block size)
> 
> urezki@...38:~/bin$ sudo mkfs.ext4 -F /dev/dm-0
> mke2fs 1.47.0 (5-Feb-2023)
> Discarding device blocks: done
> Creating filesystem with 2072576 4k blocks and 518144 inodes
> Filesystem UUID: bd0b6ca6-0506-4e31-86da-8d22c9d50b63
> Superblock backups stored on blocks:
>         32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632
> 
> Allocating group tables: done
> Writing inode tables: done
> Creating journal (16384 blocks): done
> Writing superblocks and filesystem accounting information: mkfs.ext4: Input/output error while writing out and closing file system
> urezki@...38:~/bin$ dmesg
> 
> <snip>
> [ 1618.875449] buffer_io_error: 1028 callbacks suppressed
> [ 1618.875456] Buffer I/O error on dev dm-0, logical block 0, lost async page write
> [ 1618.875527] Buffer I/O error on dev dm-0, logical block 1, lost async page write
> [ 1618.875602] Buffer I/O error on dev dm-0, logical block 2, lost async page write
> [ 1618.875620] Buffer I/O error on dev dm-0, logical block 3, lost async page write
> [ 1618.875639] Buffer I/O error on dev dm-0, logical block 4, lost async page write
> [ 1618.894316] Buffer I/O error on dev dm-0, logical block 5, lost async page write
> [ 1618.894358] Buffer I/O error on dev dm-0, logical block 6, lost async page write
> [ 1618.894380] Buffer I/O error on dev dm-0, logical block 7, lost async page write
> [ 1618.894405] Buffer I/O error on dev dm-0, logical block 8, lost async page write
> [ 1618.894427] Buffer I/O error on dev dm-0, logical block 9, lost async page write
> <snip>
> 
> Many I/O errors because the lower 8K device rejects sub-ubs/misaligned
> requests.
> 
> with a patch:
> 
> urezki@...38:~/bin$ sudo mkfs.ext4 -F /dev/dm-0
> mke2fs 1.47.0 (5-Feb-2023)
> Discarding device blocks: done
> Creating filesystem with 2072576 4k blocks and 518144 inodes
> Filesystem UUID: 9b54f44f-ef55-4bd4-9e40-c8b775a616ac
> Superblock backups stored on blocks:
>         32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632
> 
> Allocating group tables: done
> Writing inode tables: done
> Creating journal (16384 blocks): done
> Writing superblocks and filesystem accounting information: done
> 
> urezki@...38:~/bin$ sudo mount /dev/dm-0 /mnt/
> urezki@...38:~/bin$ ls -al /mnt/
> total 24
> drwxr-xr-x  3 root root  4096 Oct 17 15:13 .
> drwxr-xr-x 19 root root  4096 Jul 10 19:42 ..
> drwx------  2 root root 16384 Oct 17 15:13 lost+found
> urezki@...38:~/bin$
> 
> After this change: mkfs completes; mount succeeds.
> 
> v1 -> v2:
>  - reflect a user space visible impact in the commit message.
> 
> Signed-off-by: Uladzislau Rezki (Sony) <urezki@...il.com>
> ---
>  drivers/md/dm-ebs-target.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/md/dm-ebs-target.c b/drivers/md/dm-ebs-target.c
> index 6abb31ca9662..b354e74a670e 100644
> --- a/drivers/md/dm-ebs-target.c
> +++ b/drivers/md/dm-ebs-target.c
> @@ -103,7 +103,7 @@ static int __ebs_rw_bvec(struct ebs_c *ec, enum req_op op, struct bio_vec *bv,
>  			} else {
>  				flush_dcache_page(bv->bv_page);
>  				memcpy(ba, pa, cur_len);
> -				dm_bufio_mark_partial_buffer_dirty(b, buf_off, buf_off + cur_len);
> +				dm_bufio_mark_buffer_dirty(b);
>  			}
>  
>  			dm_bufio_release(b);
> -- 
> 2.47.3
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ