lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <43a34aa8-3f2f-4d86-be53-8a832be8532f@huaweicloud.com>
Date: Thu, 10 Apr 2025 11:52:17 +0800
From: Zhang Yi <yi.zhang@...weicloud.com>
To: Christoph Hellwig <hch@....de>
Cc: linux-fsdevel@...r.kernel.org, linux-ext4@...r.kernel.org,
 linux-block@...r.kernel.org, dm-devel@...ts.linux.dev,
 linux-nvme@...ts.infradead.org, linux-scsi@...r.kernel.org,
 linux-xfs@...r.kernel.org, linux-kernel@...r.kernel.org, tytso@....edu,
 djwong@...nel.org, john.g.garry@...cle.com, bmarzins@...hat.com,
 chaitanyak@...dia.com, shinichiro.kawasaki@....com, yi.zhang@...wei.com,
 chengzhihao1@...wei.com, yukuai3@...wei.com, yangerkun@...wei.com
Subject: Re: [RFC PATCH -next v3 01/10] block: introduce
 BLK_FEAT_WRITE_ZEROES_UNMAP to queue limits features

On 2025/4/9 18:31, Christoph Hellwig wrote:
> On Tue, Mar 18, 2025 at 03:35:36PM +0800, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@...wei.com>
>>
>> Currently, disks primarily implement the write zeroes command (aka
>> REQ_OP_WRITE_ZEROES) through two mechanisms: the first involves
>> physically writing zeros to the disk media (e.g., HDDs), while the
>> second performs an unmap operation on the logical blocks, effectively
>> putting them into a deallocated state (e.g., SSDs). The first method is
>> generally slow, while the second method is typically very fast.
>>
>> For example, on certain NVMe SSDs that support NVME_NS_DEAC, submitting
>> REQ_OP_WRITE_ZEROES requests with the NVME_WZ_DEAC bit can accelerate
>> the write zeros operation by placing disk blocks into
> 
> Note that this is a can, not a must.  The NVMe definition of Write
> Zeroes is unfortunately pretty stupid.
> 
>> +		[RO] Devices that explicitly support the unmap write zeroes
>> +		operation in which a single write zeroes request with the unmap
>> +		bit set to zero out the range of contiguous blocks on storage
>> +		by freeing blocks, rather than writing physical zeroes to the
>> +		media.
> 
> This is not actually guaranteed for nvme or scsi.

Thank you for your review and comments. However, I'm not sure I fully
understand your points. Could you please provide more details?

AFAIK, the NVMe protocol has the following description in the latest
NVM Command Set Specification Figure 82 and Figure 114:

===
Deallocate (DEAC): If this bit is set to ‘1’, then the host is
requesting that the controller deallocate the specified logical blocks.
If this bit is cleared to ‘0’, then the host is not requesting that
the controller deallocate the specified logical blocks...

DLFEAT:
Write Zeroes Deallocation Support (WZDS): If this bit is set to ‘1’,
then the controller supports the Deallocate bit in the Write Zeroes
command for this namespace...
Deallocation Read Behavior (DRB): This field indicates the deallocated
logical block read behavior. For a logical block that is deallocated,
this field indicates the values read from that deallocated logical block
and its metadata (excluding protection information)...

  Value  Definition
  001b   A deallocated logical block returns all bytes cleared to 0h
===

At the same time, the current kernel determines whether to set the
unmap bit when submitting the write zeroes command based on the above
protocol. So I think this rules should be clear now.

Were you saying that what is described in this protocol is not a
mandatory requirement? Which means the disks that claiming to support
the UNMAP write zeroes command(WZDS=1,DRB=1), but in fact, they still
write actual zeroes data to the storage media? Or were you referring
to some irregular disks that do not obey the protocol and mislead
users?

Thanks,
Yi.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ