lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <d0f8315b-e006-498a-b3e8-77542f352d40@oracle.com>
Date: Tue, 28 Jan 2025 16:46:58 +0000
From: John Garry <john.g.garry@...cle.com>
To: Zhang Yi <yi.zhang@...weicloud.com>, linux-fsdevel@...r.kernel.org,
        linux-ext4@...r.kernel.org, linux-block@...r.kernel.org,
        dm-devel@...ts.linux.dev, linux-nvme@...ts.infradead.org,
        linux-scsi@...r.kernel.org
Cc: linux-kernel@...r.kernel.org, hch@....de, tytso@....edu, djwong@...nel.org,
        yi.zhang@...wei.com, chengzhihao1@...wei.com, yukuai3@...wei.com,
        yangerkun@...wei.com
Subject: Re: [RFC PATCH v2 1/8] block: introduce BLK_FEAT_WRITE_ZEROES_UNMAP
 to queue limits features

On 15/01/2025 11:46, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@...wei.com>
> 
> Currently, it's hard to know whether the storage device supports unmap
> write zeroes. We cannot determine it only by checking if the disk
> supports the write zeroes command, as for some HDDs that do submit
> actual zeros to the disk media even if they claim to support the write
> zeroes command, but that should be very slow.

This second sentence is too long, such that your meaning is hard to 
understand.

> 
> Therefor, add a new queue limit feature, BLK_FEAT_WRITE_ZEROES_UNMAP and

Therefore?

> the corresponding sysfs entry, to indicate whether the block device
> explicitly supports the unmapped write zeroes command. Each device
> driver should set this bit if it is certain that the attached disk
> supports this command. 

How can they be certain? You already wrote that some claim to support 
it, yet don't really. Well, I think that is what you meant.

> If the bit is not set, the disk either does not
> support it, or its support status is unknown.
> 
> For the stacked devices cases, the BLK_FEAT_WRITE_ZEROES_UNMAP should be
> supported both by the stacking driver and all underlying devices.
> 
> Signed-off-by: Zhang Yi <yi.zhang@...wei.com>
> ---
>   Documentation/ABI/stable/sysfs-block | 14 ++++++++++++++
>   block/blk-settings.c                 |  6 ++++++
>   block/blk-sysfs.c                    |  3 +++
>   include/linux/blkdev.h               |  3 +++
>   4 files changed, 26 insertions(+)
> 
> diff --git a/Documentation/ABI/stable/sysfs-block b/Documentation/ABI/stable/sysfs-block
> index 0cceb2badc83..ab4117cefd9a 100644
> --- a/Documentation/ABI/stable/sysfs-block
> +++ b/Documentation/ABI/stable/sysfs-block
> @@ -722,6 +722,20 @@ Description:
>   		0, write zeroes is not supported by the device.
>   
>   
> +What:		/sys/block/<disk>/queue/write_zeroes_unmap
> +Date:		January 2025
> +Contact:	Zhang Yi <yi.zhang@...wei.com>
> +Description:
> +		[RO] Devices that explicitly support the unmap write zeroes
> +		operation in which a single write zeroes request with the unmap
> +		bit set to zero out the range of contiguous blocks on storage

which bit are you referring to?

> +		by freeing blocks, rather than writing physical zeroes to the
> +		media. If write_zeroes_unmap is 1, this indicates that the
> +		device explicitly supports the write zero command. Otherwise,
> +		the device either does not support it, or its support status is
> +		unknown.

I am struggling to understand the full meaning of a value of '0'.

Does it means that either:
a. it does not support write zero
b. it does support write zero, yet just did not set 
BLK_FEAT_WRITE_ZEROES_UNMAP


> +
> +
>   What:		/sys/block/<disk>/queue/zone_append_max_bytes
>   Date:		May 2020
>   Contact:	linux-block@...r.kernel.org
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index 8f09e33f41f6..a8bf2f8f0634 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -652,6 +652,8 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
>   		t->features &= ~BLK_FEAT_NOWAIT;
>   	if (!(b->features & BLK_FEAT_POLL))
>   		t->features &= ~BLK_FEAT_POLL;
> +	if (!(b->features & BLK_FEAT_WRITE_ZEROES_UNMAP))
> +		t->features &= ~BLK_FEAT_WRITE_ZEROES_UNMAP;

Why not just set this in BLK_FEAT_INHERIT_MASK? It's seems like a 
sensible thing to do...

>   
>   	t->flags |= (b->flags & BLK_FLAG_MISALIGNED);
>   
> @@ -774,6 +776,10 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
>   		t->zone_write_granularity = 0;
>   		t->max_zone_append_sectors = 0;
>   	}
> +
> +	if (!t->max_write_zeroes_sectors)
> +		t->features &= ~BLK_FEAT_WRITE_ZEROES_UNMAP;
> +
>   	blk_stack_atomic_writes_limits(t, b);
>   
>   	return ret;
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index 767598e719ab..13f22bee19d2 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -248,6 +248,7 @@ static ssize_t queue_##_name##_show(struct gendisk *disk, char *page)	\
>   QUEUE_SYSFS_FEATURE_SHOW(poll, BLK_FEAT_POLL);
>   QUEUE_SYSFS_FEATURE_SHOW(fua, BLK_FEAT_FUA);
>   QUEUE_SYSFS_FEATURE_SHOW(dax, BLK_FEAT_DAX);
> +QUEUE_SYSFS_FEATURE_SHOW(write_zeroes_unmap, BLK_FEAT_WRITE_ZEROES_UNMAP);
>   
>   static ssize_t queue_zoned_show(struct gendisk *disk, char *page)
>   {
> @@ -468,6 +469,7 @@ QUEUE_RO_ENTRY(queue_atomic_write_unit_min, "atomic_write_unit_min_bytes");
>   
>   QUEUE_RO_ENTRY(queue_write_same_max, "write_same_max_bytes");
>   QUEUE_RO_ENTRY(queue_max_write_zeroes_sectors, "write_zeroes_max_bytes");
> +QUEUE_RO_ENTRY(queue_write_zeroes_unmap, "write_zeroes_unmap");
>   QUEUE_RO_ENTRY(queue_max_zone_append_sectors, "zone_append_max_bytes");
>   QUEUE_RO_ENTRY(queue_zone_write_granularity, "zone_write_granularity");
>   
> @@ -615,6 +617,7 @@ static struct attribute *queue_attrs[] = {
>   	&queue_poll_delay_entry.attr,
>   	&queue_virt_boundary_mask_entry.attr,
>   	&queue_dma_alignment_entry.attr,
> +	&queue_write_zeroes_unmap_entry.attr,
>   	NULL,
>   };
>   
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 378d3a1a22fc..14ba1e2709bb 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -335,6 +335,9 @@ typedef unsigned int __bitwise blk_features_t;
>   #define BLK_FEAT_ATOMIC_WRITES_STACKED \
>   	((__force blk_features_t)(1u << 16))
>   
> +/* supports unmap write zeroes command */
> +#define BLK_FEAT_WRITE_ZEROES_UNMAP	((__force blk_features_t)(1u << 17))

Is this flag ever checked within the kernel?

If not, I assume your idea is that the user checks this flag via sysfs 
for the block device which the fs is mounted on just to know if 
FALLOC_FL_WRITE_ZEROES is definitely fast, right?

> +
>   /*
>    * Flags automatically inherited when stacking limits.
>    */


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ