[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <yq1ttr6qoqp.fsf@ca-mkp.ca.oracle.com>
Date: Wed, 04 Oct 2023 14:17:02 -0400
From: "Martin K. Petersen" <martin.petersen@...cle.com>
To: Bart Van Assche <bvanassche@....org>
Cc: "Martin K. Petersen" <martin.petersen@...cle.com>,
John Garry <john.g.garry@...cle.com>, axboe@...nel.dk,
kbusch@...nel.org, hch@....de, sagi@...mberg.me,
jejb@...ux.ibm.com, djwong@...nel.org, viro@...iv.linux.org.uk,
brauner@...nel.org, chandan.babu@...cle.com, dchinner@...hat.com,
linux-block@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-nvme@...ts.infradead.org, linux-xfs@...r.kernel.org,
linux-fsdevel@...r.kernel.org, tytso@....edu, jbongio@...gle.com,
linux-api@...r.kernel.org
Subject: Re: [PATCH 10/21] block: Add fops atomic write support
Hi Bart!
> In other words, also for the above example it is guaranteed that
> writes of a single logical block (512 bytes) are atomic, no matter
> what value is reported as the ATOMIC TRANSFER LENGTH GRANULARITY.
There is no formal guarantee that a disk drive sector read-modify-write
operation results in a readable sector after a power failure. We have
definitely seen blocks being mangled in the field.
Wrt. supporting SCSI atomic block operations, the device rejects the
WRITE ATOMIC(16) command if you attempt to use a transfer length smaller
than the reported granularity. If we want to support WRITE ATOMIC(16) we
have to abide by the values reported by the device. It is not optional.
Besides, the whole point of this patch set is to increase the
"observable atomic block size" beyond the physical block size to
facilitate applications that prefer to use blocks in the 8-64KB range.
IOW, using the logical block size is not particularly interesting. The
objective is to prevent tearing of much larger blocks.
> How about aligning the features of the two protocols as much as
> possible? My understanding is that all long-term T10 contributors are
> all in favor of this.
That is exactly what this patch set does. Out of the 5-6 different
"atomic" modes of operation permitted by SCSI and NVMe, our exposed
semantics are carefully chosen to permit all compliant devices to be
used. Based on only two reported queue limits (FWIW, we started with way
more than that. I believe that complexity was part of the first RFC we
posted). Whereas this series hides most of the complexity in the various
unfortunate protocol quirks behind a simple interface: Your tear-proof
writes can't be smaller than X bytes and larger than Y bytes and they
must be naturally aligned. This simplified things substantially from an
application perspective.
--
Martin K. Petersen Oracle Linux Engineering
Powered by blists - more mailing lists