lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <54a30f2a-99f5-486a-9d9d-d8e2b1007c6c@kernel.org>
Date: Fri, 5 Jul 2024 09:13:05 +0900
From: Damien Le Moal <dlemoal@...nel.org>
To: Mikulas Patocka <mpatocka@...hat.com>, Li Dong <lidong@...o.com>,
 Naohiro Aota <naohiro.aota@....com>, Johannes Thumshirn <jth@...nel.org>,
 Jens Axboe <axboe@...nel.dk>
Cc: Alasdair Kergon <agk@...hat.com>, Mike Snitzer <snitzer@...nel.org>,
 "open list:DEVICE-MAPPER (LVM)" <dm-devel@...ts.linux.dev>,
 open list <linux-kernel@...r.kernel.org>, opensource.kernel@...o.com,
 linux-fsdevel@...r.kernel.org, linux-block@...r.kernel.org,
 "Pankaj Raghav (Samsung)" <kernel@...kajraghav.com>
Subject: Re: Non-power-of-2 zone size

On 7/5/24 04:00, Mikulas Patocka wrote:
> 
> 
>> On Thu, 4 Jul 2024, Li Dong wrote:
>>
>>> For zone block devices, device_area_is_invalid may return an incorrect 
>>> value.
>>>
>>> Failure log:
>>> [   19.337657]: device-mapper: table: 254:56: len=836960256 not aligned to
>>> h/w zone size 3244032 of sde
>>> [   19.337665]: device-mapper: core: Cannot calculate initial queue limits
>>> [   19.337667]: device-mapper: ioctl: unable to set up device queue for 
>>> new table.
>>>
>>> Actually, the device's zone length is aligned to the zonesize.
>>>
>>> Fixes: 5dea271b6d87 ("dm table: pass correct dev area size to device_area_is_valid")
>>> Signed-off-by: Li Dong <lidong@...o.com>
>>> ---
>>>  drivers/md/dm-table.c | 4 ++--
>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
>>> index 33b7a1844ed4..0bddae0bee3c 100644
>>> --- a/drivers/md/dm-table.c
>>> +++ b/drivers/md/dm-table.c
>>> @@ -257,7 +257,7 @@ static int device_area_is_invalid(struct dm_target *ti, struct dm_dev *dev,
>>>  	if (bdev_is_zoned(bdev)) {
>>>  		unsigned int zone_sectors = bdev_zone_sectors(bdev);
>>>  
>>> -		if (start & (zone_sectors - 1)) {
>>> +		if (start % zone_sectors) {
>>>  			DMERR("%s: start=%llu not aligned to h/w zone size %u of %pg",
>>>  			      dm_device_name(ti->table->md),
>>>  			      (unsigned long long)start,
>>> @@ -274,7 +274,7 @@ static int device_area_is_invalid(struct dm_target *ti, struct dm_dev *dev,
>>>  		 * devices do not end up with a smaller zone in the middle of
>>>  		 * the sector range.
>>>  		 */
>>> -		if (len & (zone_sectors - 1)) {
>>> +		if (len % zone_sectors) {
>>>  			DMERR("%s: len=%llu not aligned to h/w zone size %u of %pg",
>>>  			      dm_device_name(ti->table->md),
>>>  			      (unsigned long long)len,
>>> -- 
>>> 2.31.1.windows.1
> 
> I grepped the kernel for bdev_zone_sectors and there are more assumptions 
> that bdev_zone_sectors is a power of 2.
> 
> drivers/md/dm-zone.c:           sector_t mask = bdev_zone_sectors(disk->part0) - 1
> drivers/nvme/target/zns.c:      if (get_capacity(bd_disk) & (bdev_zone_sectors(ns->bdev) - 1))
> drivers/nvme/target/zns.c:      if (sect & (bdev_zone_sectors(req->ns->bdev) - 1)) {
> fs/zonefs/super.c:      sbi->s_zone_sectors_shift = ilog2(bdev_zone_sectors(sb->s_bdev));
> fs/btrfs/zoned.c:       return (sector_t)zone_number << ilog2(bdev_zone_sectors(bdev));
> fs/btrfs/zoned.c:	zone_info->zone_size_shift = ilog2(zone_info->zone_size);
> include/linux/blkdev.h: return sector & (bdev_zone_sectors(bdev) - 1);
> fs/f2fs/super.c:	if (nr_sectors & (zone_sectors - 1))
> 
> So, if we want to support non-power-of-2 zone size, we need some 
> systematic fix. Now it appears that Linux doesn't even attempt to support 
> disks non-power-of-2 zone size.

Correct. We currently do not support zoned devices with a zone size that is not
a power of 2 number of LBAs. Such drives are rejected by the block layer. So I
am surprised that Li could even reach a DM error. It means that his kernel was
already patched to accept zoned drives with a zone size that is not a power of 2.

> I added Damien Le Moal so that he can help with testing disks with 
> non-power-of-2 zone size (if WD is actually making them).

No, I do not have such drive. The vast majority of zoned device users today are
SMR HDD users, and in that space, no one wants a non power of 2 zone size drive.
As far as I know, the main push is from zoned-UFS users for Android, as Pankaj
mentioned.

As you rightly guessed, and as Pankaj confirmed, supporting non power of 2 zone
size drives requires a lot more changes than just the fix Li sent for DM. Pankaj
initial patch set that did not make it upstream (but is in Android) would need
to be reworked given the extensive changes that happened since then (zone write
locking replaced with zone write plugging, queue limits changes, etc).

-- 
Damien Le Moal
Western Digital Research


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ