[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <445a05e7-f912-4fb8-b66e-204a05a1524f@oracle.com>
Date: Wed, 14 Feb 2024 11:29:10 +0000
From: John Garry <john.g.garry@...cle.com>
To: Nilay Shroff <nilay@...ux.ibm.com>
Cc: axboe@...nel.dk, brauner@...nel.org, bvanassche@....org,
dchinner@...hat.com, djwong@...nel.org, hch@....de, jack@...e.cz,
jbongio@...gle.com, jejb@...ux.ibm.com, kbusch@...nel.org,
linux-block@...r.kernel.org, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-nvme@...ts.infradead.org,
linux-scsi@...r.kernel.org, linux-xfs@...r.kernel.org,
martin.petersen@...cle.com, ming.lei@...hat.com, ojaswin@...ux.ibm.com,
sagi@...mberg.me, tytso@....edu, viro@...iv.linux.org.uk
Subject: Re: [PATCH v3 10/15] block: Add fops atomic write support
On 14/02/2024 09:38, Nilay Shroff wrote:
>
>
> On 2/13/24 17:22, John Garry wrote:
>> On 13/02/2024 11:08, Nilay Shroff wrote:
>>>> It's relied that atomic_write_unit_max is <= atomic_write_boundary and both are a power-of-2. Please see the NVMe patch, which this is checked. Indeed, it would not make sense if atomic_write_unit_max > atomic_write_boundary (when non-zero).
>>>>
>>>> So if the write is naturally aligned and its size is <= atomic_write_unit_max, then it cannot be straddling a boundary.
>>> Ok fine but in case the device doesn't support namespace atomic boundary size (i.e. NABSPF is zero) then still do we need
>>> to restrict IO which crosses the atomic boundary?
>>
>> Is there a boundary if NABSPF is zero?
> If NABSPF is zero then there's no boundary and so we may not need to worry about IO crossing boundary.
>
> Even though, the atomic boundary is not defined, this function doesn't allow atomic write crossing atomic_write_unit_max_bytes.
> For instance, if AWUPF is 63 and an IO starts atomic write from logical block #32 and the number of logical blocks to be written
When you say "IO", you need to be clearer. Do you mean a write from
userspace or a merged atomic write?
If userspace issues an atomic write which is 64 blocks at offset 32,
then it will be rejected.
It will be rejected as it is not naturally aligned, e.g. a 64 block
writes can only be at offset 0, 64, 128,
> in this IO equals to #64 then it's not allowed.
> However if this same IO starts from logical block #0 then it's allowed.
> So my point here's that can this restriction be avoided when atomic boundary is zero (or not defined)?
We want a consistent set of rules for userspace to follow, whether the
atomic boundary is zero or non-zero.
Currently the atomic boundary only comes into play for merging writes,
i.e. we cannot merge a write in which the resultant IO straddles a boundary.
>
> Also, it seems that the restriction implemented for atomic write to succeed are very strict. For example, atomic-write can't
> succeed if an IO starts from logical block #8 and the number of logical blocks to be written in this IO equals to #16.
> In this particular case, IO is well within atomic-boundary (if it's defined) and atomic-size-limit, so why do we NOT want to
> allow it? Is it intentional? I think, the spec doesn't mention about such limitation.
According to the NVMe spec, this is ok. However we don't want the user
to have to deal with things like NVMe boundaries. Indeed, for FSes, we
do not have a direct linear map from FS blocks to physical blocks, so it
would be impossible for the user to know about a boundary condition in
this context.
We are trying to formulate rules which work for the somewhat orthogonal
HW features of both SCSI and NVMe for both block devices and FSes, while
also dealing with alignment concerns of extent-based FSes, like XFS.
>
>>
>>>
>>> I am quoting this from NVMe spec (Command Set Specification, revision 1.0a, Section 2.1.4.3) :
>>> "To ensure backwards compatibility, the values reported for AWUN, AWUPF, and ACWU shall be set such that
>>> they are supported even if a write crosses an atomic boundary. If a controller does not
>>> guarantee atomicity across atomic boundaries, the controller shall set AWUN, AWUPF, and ACWU to 0h (1 LBA)."
>>
>> How about respond to the NVMe patch in this series, asking this question?
>>
> Yes I will send this query to the NVMe patch in this series.
Thanks,
John
Powered by blists - more mailing lists