linux-kernel - Re: [PATCH v6 0/6] block/md/dm: set chunk

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <c71ce330-d7b5-45ea-ba46-97598516e9fc@kernel.org>
Date: Mon, 14 Jul 2025 15:00:57 +0900
From: Damien Le Moal <dlemoal@...nel.org>
To: Christoph Hellwig <hch@....de>
Cc: John Garry <john.g.garry@...cle.com>, agk@...hat.com, snitzer@...nel.org,
 mpatocka@...hat.com, song@...nel.org, yukuai3@...wei.com,
 nilay@...ux.ibm.com, axboe@...nel.dk, cem@...nel.org,
 dm-devel@...ts.linux.dev, linux-kernel@...r.kernel.org,
 linux-raid@...r.kernel.org, linux-block@...r.kernel.org,
 ojaswin@...ux.ibm.com, martin.petersen@...cle.com,
 akpm@...ux-foundation.org, linux-xfs@...r.kernel.org, djwong@...nel.org
Subject: Re: [PATCH v6 0/6] block/md/dm: set chunk_sectors from stacked dev
 stripe size

On 2025/07/14 14:53, Christoph Hellwig wrote:
> On Fri, Jul 11, 2025 at 05:44:26PM +0900, Damien Le Moal wrote:
>> On 7/11/25 5:09 PM, John Garry wrote:
>>> This value in io_min is used to configure any atomic write limit for the
>>> stacked device. The idea is that the atomic write unit max is a
>>> power-of-2 factor of the stripe size, and the stripe size is available
>>> in io_min.
>>>
>>> Using io_min causes issues, as:
>>> a. it may be mutated
>>> b. the check for io_min being set for determining if we are dealing with
>>> a striped device is hard to get right, as reported in [0].
>>>
>>> This series now sets chunk_sectors limit to share stripe size.
>>
>> Hmm... chunk_sectors for a zoned device is the zone size. So is this all safe
>> if we are dealing with a zoned block device that also supports atomic writes ?
> 
> Btw, I wonder if it's time to decouple the zone size from the chunk
> size eventually.  It seems like a nice little hack, but with things
> like parity raid for zoned devices now showing up at least in academia,
> and nvme devices reporting chunk sizes the overload might not be that
> good any more.

Agreed, it would be nice to clean that up. BUT, the chunk_sectors sysfs
attribute file is reporting the zone size today. Changing that may break
applications. So I am not sure if we can actually do that, unless the sysfs
interface is considered as "unstable" ?

> 
>> Not that I know of any such device, but better be safe, so maybe for now
>> do not enable atomic write support on zoned devices ?
> 
> How would atomic writes make sense for zone devices?  Because all writes
> up to the reported write pointer must be valid, there usual checks for
> partial updates a lacking, so the only use would be to figure out if a
> write got truncated.  At least for file systems we detects this using the
> fs metadata that must be written on I/O completion anyway, so the only
> user would be an application with some sort of speculative writes that
> can't detect partial writes. Which sounds rather fringe and dangerous.

The only thing I can think of which would make sense is to avoid torn writes
with SAS drives. But in itself, that is extremely niche.

> 
> Now we should be able to implement the software atomic writes pretty
> easily for zoned XFS, and funnily they might actually be slightly faster
> than normal writes due to the transaction batching.  Now that we're
> getting reasonable test coverage we should be able to give it a spin, but
> I have a few too many things on my plate at the moment.


-- 
Damien Le Moal
Western Digital Research