linux-kernel - Re: [PATCH v3 10/15] block: Add fops atomic write support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <445a05e7-f912-4fb8-b66e-204a05a1524f@oracle.com>
Date: Wed, 14 Feb 2024 11:29:10 +0000
From: John Garry <john.g.garry@...cle.com>
To: Nilay Shroff <nilay@...ux.ibm.com>
Cc: axboe@...nel.dk, brauner@...nel.org, bvanassche@....org,
        dchinner@...hat.com, djwong@...nel.org, hch@....de, jack@...e.cz,
        jbongio@...gle.com, jejb@...ux.ibm.com, kbusch@...nel.org,
        linux-block@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-nvme@...ts.infradead.org,
        linux-scsi@...r.kernel.org, linux-xfs@...r.kernel.org,
        martin.petersen@...cle.com, ming.lei@...hat.com, ojaswin@...ux.ibm.com,
        sagi@...mberg.me, tytso@....edu, viro@...iv.linux.org.uk
Subject: Re: [PATCH v3 10/15] block: Add fops atomic write support

On 14/02/2024 09:38, Nilay Shroff wrote:
> 
> 
> On 2/13/24 17:22, John Garry wrote:
>> On 13/02/2024 11:08, Nilay Shroff wrote:
>>>> It's relied that atomic_write_unit_max is <= atomic_write_boundary and both are a power-of-2. Please see the NVMe patch, which this is checked. Indeed, it would not make sense if atomic_write_unit_max > atomic_write_boundary (when non-zero).
>>>>
>>>> So if the write is naturally aligned and its size is <= atomic_write_unit_max, then it cannot be straddling a boundary.
>>> Ok fine but in case the device doesn't support namespace atomic boundary size (i.e. NABSPF is zero) then still do we need
>>> to restrict IO which crosses the atomic boundary?
>>
>> Is there a boundary if NABSPF is zero?
> If NABSPF is zero then there's no boundary and so we may not need to worry about IO crossing boundary.
> 
> Even though, the atomic boundary is not defined, this function doesn't allow atomic write crossing atomic_write_unit_max_bytes.
> For instance, if AWUPF is 63 and an IO starts atomic write from logical block #32 and the number of logical blocks to be written

When you say "IO", you need to be clearer. Do you mean a write from 
userspace or a merged atomic write?

If userspace issues an atomic write which is 64 blocks at offset 32, 
then it will be rejected.

It will be rejected as it is not naturally aligned, e.g. a 64 block 
writes can only be at offset 0, 64, 128,

> in this IO equals to #64 then it's not allowed.
>  However if this same IO starts from logical block #0 then it's allowed.
> So my point here's that can this restriction be avoided when atomic boundary is zero (or not defined)?

We want a consistent set of rules for userspace to follow, whether the 
atomic boundary is zero or non-zero.

Currently the atomic boundary only comes into play for merging writes, 
i.e. we cannot merge a write in which the resultant IO straddles a boundary.

> 
> Also, it seems that the restriction implemented for atomic write to succeed are very strict. For example, atomic-write can't
> succeed if an IO starts from logical block #8 and the number of logical blocks to be written in this IO equals to #16.
> In this particular case, IO is well within atomic-boundary (if it's defined) and atomic-size-limit, so why do we NOT want to
> allow it? Is it intentional? I think, the spec doesn't mention about such limitation.

According to the NVMe spec, this is ok. However we don't want the user 
to have to deal with things like NVMe boundaries. Indeed, for FSes, we 
do not have a direct linear map from FS blocks to physical blocks, so it 
would be impossible for the user to know about a boundary condition in 
this context.

We are trying to formulate rules which work for the somewhat orthogonal 
HW features of both SCSI and NVMe for both block devices and FSes, while 
also dealing with alignment concerns of extent-based FSes, like XFS.

> 
>>
>>>
>>> I am quoting this from NVMe spec (Command Set Specification, revision 1.0a, Section 2.1.4.3) :
>>> "To ensure backwards compatibility, the values reported for AWUN, AWUPF, and ACWU shall be set such that
>>> they  are  supported  even  if  a  write  crosses  an  atomic  boundary.  If  a  controller  does  not
>>> guarantee atomicity across atomic boundaries, the controller shall set AWUN, AWUPF, and ACWU to 0h (1 LBA)."
>>
>> How about respond to the NVMe patch in this series, asking this question?
>>
> Yes I will send this query to the NVMe patch in this series.

Thanks,
John