linux-kernel - Re: [PATCH v3 10/15] block: Add fops atomic write support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <d34ef016-88d2-4ae9-9a7b-f7431429acc7@linux.ibm.com>
Date: Wed, 14 Feb 2024 17:17:47 +0530
From: Nilay Shroff <nilay@...ux.ibm.com>
To: John Garry <john.g.garry@...cle.com>
Cc: axboe@...nel.dk, brauner@...nel.org, bvanassche@....org,
        dchinner@...hat.com, djwong@...nel.org, hch@....de, jack@...e.cz,
        jbongio@...gle.com, jejb@...ux.ibm.com, kbusch@...nel.org,
        linux-block@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-nvme@...ts.infradead.org,
        linux-scsi@...r.kernel.org, linux-xfs@...r.kernel.org,
        martin.petersen@...cle.com, ming.lei@...hat.com, ojaswin@...ux.ibm.com,
        sagi@...mberg.me, tytso@....edu, viro@...iv.linux.org.uk
Subject: Re: [PATCH v3 10/15] block: Add fops atomic write support



On 2/14/24 16:59, John Garry wrote:
> On 14/02/2024 09:38, Nilay Shroff wrote:
>>
>>
>> On 2/13/24 17:22, John Garry wrote:
>>> On 13/02/2024 11:08, Nilay Shroff wrote:
>>>>> It's relied that atomic_write_unit_max is <= atomic_write_boundary and both are a power-of-2. Please see the NVMe patch, which this is checked. Indeed, it would not make sense if atomic_write_unit_max > atomic_write_boundary (when non-zero).
>>>>>
>>>>> So if the write is naturally aligned and its size is <= atomic_write_unit_max, then it cannot be straddling a boundary.
>>>> Ok fine but in case the device doesn't support namespace atomic boundary size (i.e. NABSPF is zero) then still do we need
>>>> to restrict IO which crosses the atomic boundary?
>>>
>>> Is there a boundary if NABSPF is zero?
>> If NABSPF is zero then there's no boundary and so we may not need to worry about IO crossing boundary.
>>
>> Even though, the atomic boundary is not defined, this function doesn't allow atomic write crossing atomic_write_unit_max_bytes.
>> For instance, if AWUPF is 63 and an IO starts atomic write from logical block #32 and the number of logical blocks to be written
> 
> When you say "IO", you need to be clearer. Do you mean a write from userspace or a merged atomic write?
Yes I meant write from the userspace. Sorry for the confusion here.
> 
> If userspace issues an atomic write which is 64 blocks at offset 32, then it will be rejected.
> 
> It will be rejected as it is not naturally aligned, e.g. a 64 block writes can only be at offset 0, 64, 128,
So it means that even though h/w may support atomic-write crossing natural alignment boundary, the kernel would still reject it.
> 
>> in this IO equals to #64 then it's not allowed.
>>  However if this same IO starts from logical block #0 then it's allowed.
>> So my point here's that can this restriction be avoided when atomic boundary is zero (or not defined)?
> 
> We want a consistent set of rules for userspace to follow, whether the atomic boundary is zero or non-zero.
> 
> Currently the atomic boundary only comes into play for merging writes, i.e. we cannot merge a write in which the resultant IO straddles a boundary.
> 
>>
>> Also, it seems that the restriction implemented for atomic write to succeed are very strict. For example, atomic-write can't
>> succeed if an IO starts from logical block #8 and the number of logical blocks to be written in this IO equals to #16.
>> In this particular case, IO is well within atomic-boundary (if it's defined) and atomic-size-limit, so why do we NOT want to
>> allow it? Is it intentional? I think, the spec doesn't mention about such limitation.
> 
> According to the NVMe spec, this is ok. However we don't want the user to have to deal with things like NVMe boundaries. Indeed, for FSes, we do not have a direct linear map from FS blocks to physical blocks, so it would be impossible for the user to know about a boundary condition in this context.
> 
> We are trying to formulate rules which work for the somewhat orthogonal HW features of both SCSI and NVMe for both block devices and FSes, while also dealing with alignment concerns of extent-based FSes, like XFS.
Hmm OK, thanks for that explanation. 

Thanks,
--Nilay