[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6135eab3-50ce-4669-a692-b4221773bb20@oracle.com>
Date: Tue, 16 Jan 2024 11:35:47 +0000
From: John Garry <john.g.garry@...cle.com>
To: Christoph Hellwig <hch@....de>
Cc: "Darrick J. Wong" <djwong@...nel.org>, axboe@...nel.dk, kbusch@...nel.org,
sagi@...mberg.me, jejb@...ux.ibm.com, martin.petersen@...cle.com,
viro@...iv.linux.org.uk, brauner@...nel.org, dchinner@...hat.com,
jack@...e.cz, linux-block@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-nvme@...ts.infradead.org,
linux-xfs@...r.kernel.org, linux-fsdevel@...r.kernel.org,
tytso@....edu, jbongio@...gle.com, linux-scsi@...r.kernel.org,
ming.lei@...hat.com, jaswin@...ux.ibm.com, bvanassche@....org
Subject: Re: [PATCH v2 00/16] block atomic writes
On 21/12/2023 13:22, Christoph Hellwig wrote:
> On Thu, Dec 21, 2023 at 01:18:33PM +0000, John Garry wrote:
>>> For SGL-capable devices that would be
>>> BIO_MAX_VECS, otherwise 1.
>> ok, but we would need to advertise that or whatever segment limit. A statx
>> field just for that seems a bit inefficient in terms of space.
> I'd rather not hard code BIO_MAX_VECS in the ABI, which suggest we
> want to export is as a field. Network file systems also might have
> their own limits for one reason or another.
Hi Christoph,
I have been looking at this issue again and I am not sure if telling the
user the max number of segments allowed is the best option. I’m worried
that resultant atomic write unit max will be too small.
The background again is that we want to tell the user what the maximum
atomic write unit size is, such that we can always guarantee to fit the
write in a single bio. And there would be no iovec length or alignment
rules.
The max segments value advertised would be min(queue max segments,
BIO_MAX_VECS), so it would be 256 when the request queue is not limiting.
The worst case scenario for iovec layout (most inefficient) which the
user could provide would be like .iov_base = 0x...0E00 and .iov_length =
0x400, which would mean that we would have 2x pages and 2x DMA sg elems
required for each 1024B-length iovec. I am assuming that we will still
use the direct IO rule of LBS length and alignment.
As such, we then need to set atomic write unit max = min(queue max
segments, BIO_MAX_VECS) * LBS. That would mean atomic write unit max 256
* 512 = 128K (for 512B LBS). For a DMA controller of max segments 64,
for example, then we would have 32K. These seem too low.
Alternative I'm thinking that we should just limit to 1x iovec always,
and then atomic write unit max = (min(queue max segments, BIO_MAX_VECS)
- 1) * PAGE_SIZE [ignoring first/last iovec contents]. It also makes
support for non-enterprise NVMe drives more straightforward. If someone
wants, they can introduce support for multi-iovec later, but it would
prob require some more iovec length/alignment rules.
Please let me know your thoughts.
Thanks,
John
Powered by blists - more mailing lists