linux-kernel - Re: [PATCH v2] nvmet: Make blksize

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <7c5b852d-4f29-41c7-a171-c0069771a5e0@kernel.org>
Date: Tue, 1 Jul 2025 16:32:36 +0900
From: Damien Le Moal <dlemoal@...nel.org>
To: Richard Weinberger <richard@...ma-star.at>,
 Richard Weinberger <richard@....at>, linux-nvme@...ts.infradead.org,
 upstream@...ma-star.at
Cc: linux-kernel@...r.kernel.org, kch@...dia.com, sagi@...mberg.me,
 hch@....de, upstream+nvme@...ma-star.at
Subject: Re: [PATCH v2] nvmet: Make blksize_shift configurable

On 7/1/25 4:09 PM, Richard Weinberger wrote:
> On Dienstag, 1. Juli 2025 02:34 'Damien Le Moal' via upstream wrote:
>> On 7/1/25 4:13 AM, Richard Weinberger wrote:
>>> Currently, the block size is automatically configured, and for
>>> file-backed namespaces it is likely to be 4K.
>>> While this is a reasonable default for modern storage, it can
>>> cause confusion if someone wants to export a pre-created disk image
>>> that uses a 512-byte block size.
>>> As a result, partition parsing will fail.
>>>
>>> So, just like we already do for the loop block device, let the user
>>> configure the block size if they know better.
>>
>> Hmm... That fine with me but this explanation does not match what the patch
>> does: you allow configuring the block size bit shift, not the size. That is not
>> super user friendly :)
>>
>> Even if internally you use the block size bit shift, I think it would be better
>> if the user facing interface is the block size as that is much easier to
>> manipulate without having to remember the exponent for powers of 2 values :)
> 
> The initial intention of this patch was exposing the blksize_shift property.
> If we want to expose this as more user friendly, I'm fine with it.
> Maybe "minimum_io_size"?

That likely will be confusing with the existing device limit io_min. I think
block_size is clear.

>> Nit: to avoid the indented if, may be write this like this: ?
>>
>> 	if (!ns->blksize_shift)
>> 		ns->blksize_shift = bdev_blksize_shift;
>>
>> 	if (ns->blksize_shift < bdev_blksize_shift) {
>> 		pr_err("Configured blksize needs to be at least %u for device %s\n",
>> 			bdev_logical_block_size(ns->bdev),
>> 			ns->device_path);
>> 		return -EINVAL;
>> 	}
> 
> It's a matter of taste, but yes...

Absolutely :)

> 
>> Also, if the backend is an HDD, do we want to allow the user to configure a
>> block size that is less than the *physical* block size ? Performance will
>> suffer on regular HDDs and writes may fail with SMR HDDs.
> 
> I'm not sure whether it's worth putting more smartness into this logic.

This may be nice to avoid users shooting themselves in the foot with a bad
setup and us having to deal with bad performance complaints...
If we do not do anything special, we will be stuck with it as a more
restrictive setup later may break some (bad) user setups. That is why I raised
the point :)

>>>  	ns->pi_type = 0;
>>>  	ns->metadata_size = 0;
>>> diff --git a/drivers/nvme/target/io-cmd-file.c b/drivers/nvme/target/io-cmd-file.c
>>> index 2d068439b129c..a4066b5a1dc97 100644
>>> --- a/drivers/nvme/target/io-cmd-file.c
>>> +++ b/drivers/nvme/target/io-cmd-file.c
>>> @@ -49,12 +49,28 @@ int nvmet_file_ns_enable(struct nvmet_ns *ns)
>>>  
>>>  	nvmet_file_ns_revalidate(ns);
>>>  
>>> -	/*
>>> -	 * i_blkbits can be greater than the universally accepted upper bound,
>>> -	 * so make sure we export a sane namespace lba_shift.
>>> -	 */
>>> -	ns->blksize_shift = min_t(u8,
>>> -			file_inode(ns->file)->i_blkbits, 12);
>>> +	if (ns->blksize_shift) {
>>> +		if (!ns->buffered_io) {
>>> +			struct block_device *sb_bdev = ns->file->f_mapping->host->i_sb->s_bdev;
>>> +			struct kstat st;
>>> +
>>> +			if (!vfs_getattr(&ns->file->f_path, &st, STATX_DIOALIGN, 0) &&
>>> +			    (st.result_mask & STATX_DIOALIGN) &&
>>> +			    (1 << ns->blksize_shift) < st.dio_offset_align)
>>> +				return -EINVAL;
>>> +
>>> +			if (sb_bdev && (1 << ns->blksize_shift < bdev_logical_block_size(sb_bdev)))
>>> +				return -EINVAL;
>>
>> I am confused... This is going to check both... But if you got STATX_DIOALIGN
>> and it is OK, you do not need (and probably should not) do the second if, no ?
> 
> I was not sure about that.
> Is it guaranteed that STATX_DIOALIGN returns something sane?

If it is defined by the FS, yes. But it may not be defined, so in that case,
you have to use the fallback of the bdev block size.


-- 
Damien Le Moal
Western Digital Research