[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <d0f8315b-e006-498a-b3e8-77542f352d40@oracle.com>
Date: Tue, 28 Jan 2025 16:46:58 +0000
From: John Garry <john.g.garry@...cle.com>
To: Zhang Yi <yi.zhang@...weicloud.com>, linux-fsdevel@...r.kernel.org,
linux-ext4@...r.kernel.org, linux-block@...r.kernel.org,
dm-devel@...ts.linux.dev, linux-nvme@...ts.infradead.org,
linux-scsi@...r.kernel.org
Cc: linux-kernel@...r.kernel.org, hch@....de, tytso@....edu, djwong@...nel.org,
yi.zhang@...wei.com, chengzhihao1@...wei.com, yukuai3@...wei.com,
yangerkun@...wei.com
Subject: Re: [RFC PATCH v2 1/8] block: introduce BLK_FEAT_WRITE_ZEROES_UNMAP
to queue limits features
On 15/01/2025 11:46, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@...wei.com>
>
> Currently, it's hard to know whether the storage device supports unmap
> write zeroes. We cannot determine it only by checking if the disk
> supports the write zeroes command, as for some HDDs that do submit
> actual zeros to the disk media even if they claim to support the write
> zeroes command, but that should be very slow.
This second sentence is too long, such that your meaning is hard to
understand.
>
> Therefor, add a new queue limit feature, BLK_FEAT_WRITE_ZEROES_UNMAP and
Therefore?
> the corresponding sysfs entry, to indicate whether the block device
> explicitly supports the unmapped write zeroes command. Each device
> driver should set this bit if it is certain that the attached disk
> supports this command.
How can they be certain? You already wrote that some claim to support
it, yet don't really. Well, I think that is what you meant.
> If the bit is not set, the disk either does not
> support it, or its support status is unknown.
>
> For the stacked devices cases, the BLK_FEAT_WRITE_ZEROES_UNMAP should be
> supported both by the stacking driver and all underlying devices.
>
> Signed-off-by: Zhang Yi <yi.zhang@...wei.com>
> ---
> Documentation/ABI/stable/sysfs-block | 14 ++++++++++++++
> block/blk-settings.c | 6 ++++++
> block/blk-sysfs.c | 3 +++
> include/linux/blkdev.h | 3 +++
> 4 files changed, 26 insertions(+)
>
> diff --git a/Documentation/ABI/stable/sysfs-block b/Documentation/ABI/stable/sysfs-block
> index 0cceb2badc83..ab4117cefd9a 100644
> --- a/Documentation/ABI/stable/sysfs-block
> +++ b/Documentation/ABI/stable/sysfs-block
> @@ -722,6 +722,20 @@ Description:
> 0, write zeroes is not supported by the device.
>
>
> +What: /sys/block/<disk>/queue/write_zeroes_unmap
> +Date: January 2025
> +Contact: Zhang Yi <yi.zhang@...wei.com>
> +Description:
> + [RO] Devices that explicitly support the unmap write zeroes
> + operation in which a single write zeroes request with the unmap
> + bit set to zero out the range of contiguous blocks on storage
which bit are you referring to?
> + by freeing blocks, rather than writing physical zeroes to the
> + media. If write_zeroes_unmap is 1, this indicates that the
> + device explicitly supports the write zero command. Otherwise,
> + the device either does not support it, or its support status is
> + unknown.
I am struggling to understand the full meaning of a value of '0'.
Does it means that either:
a. it does not support write zero
b. it does support write zero, yet just did not set
BLK_FEAT_WRITE_ZEROES_UNMAP
> +
> +
> What: /sys/block/<disk>/queue/zone_append_max_bytes
> Date: May 2020
> Contact: linux-block@...r.kernel.org
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index 8f09e33f41f6..a8bf2f8f0634 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -652,6 +652,8 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
> t->features &= ~BLK_FEAT_NOWAIT;
> if (!(b->features & BLK_FEAT_POLL))
> t->features &= ~BLK_FEAT_POLL;
> + if (!(b->features & BLK_FEAT_WRITE_ZEROES_UNMAP))
> + t->features &= ~BLK_FEAT_WRITE_ZEROES_UNMAP;
Why not just set this in BLK_FEAT_INHERIT_MASK? It's seems like a
sensible thing to do...
>
> t->flags |= (b->flags & BLK_FLAG_MISALIGNED);
>
> @@ -774,6 +776,10 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
> t->zone_write_granularity = 0;
> t->max_zone_append_sectors = 0;
> }
> +
> + if (!t->max_write_zeroes_sectors)
> + t->features &= ~BLK_FEAT_WRITE_ZEROES_UNMAP;
> +
> blk_stack_atomic_writes_limits(t, b);
>
> return ret;
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index 767598e719ab..13f22bee19d2 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -248,6 +248,7 @@ static ssize_t queue_##_name##_show(struct gendisk *disk, char *page) \
> QUEUE_SYSFS_FEATURE_SHOW(poll, BLK_FEAT_POLL);
> QUEUE_SYSFS_FEATURE_SHOW(fua, BLK_FEAT_FUA);
> QUEUE_SYSFS_FEATURE_SHOW(dax, BLK_FEAT_DAX);
> +QUEUE_SYSFS_FEATURE_SHOW(write_zeroes_unmap, BLK_FEAT_WRITE_ZEROES_UNMAP);
>
> static ssize_t queue_zoned_show(struct gendisk *disk, char *page)
> {
> @@ -468,6 +469,7 @@ QUEUE_RO_ENTRY(queue_atomic_write_unit_min, "atomic_write_unit_min_bytes");
>
> QUEUE_RO_ENTRY(queue_write_same_max, "write_same_max_bytes");
> QUEUE_RO_ENTRY(queue_max_write_zeroes_sectors, "write_zeroes_max_bytes");
> +QUEUE_RO_ENTRY(queue_write_zeroes_unmap, "write_zeroes_unmap");
> QUEUE_RO_ENTRY(queue_max_zone_append_sectors, "zone_append_max_bytes");
> QUEUE_RO_ENTRY(queue_zone_write_granularity, "zone_write_granularity");
>
> @@ -615,6 +617,7 @@ static struct attribute *queue_attrs[] = {
> &queue_poll_delay_entry.attr,
> &queue_virt_boundary_mask_entry.attr,
> &queue_dma_alignment_entry.attr,
> + &queue_write_zeroes_unmap_entry.attr,
> NULL,
> };
>
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 378d3a1a22fc..14ba1e2709bb 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -335,6 +335,9 @@ typedef unsigned int __bitwise blk_features_t;
> #define BLK_FEAT_ATOMIC_WRITES_STACKED \
> ((__force blk_features_t)(1u << 16))
>
> +/* supports unmap write zeroes command */
> +#define BLK_FEAT_WRITE_ZEROES_UNMAP ((__force blk_features_t)(1u << 17))
Is this flag ever checked within the kernel?
If not, I assume your idea is that the user checks this flag via sysfs
for the block device which the fs is mounted on just to know if
FALLOC_FL_WRITE_ZEROES is definitely fast, right?
> +
> /*
> * Flags automatically inherited when stacking limits.
> */
Powered by blists - more mailing lists