[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zhg0_Pvlh9zy4zzG@bombadil.infradead.org>
Date: Thu, 11 Apr 2024 12:07:40 -0700
From: Luis Chamberlain <mcgrof@...nel.org>
To: John Garry <john.g.garry@...cle.com>
Cc: Matthew Wilcox <willy@...radead.org>,
Pankaj Raghav <p.raghav@...sung.com>,
Daniel Gomez <da.gomez@...sung.com>,
Javier González <javier.gonz@...sung.com>,
axboe@...nel.dk, kbusch@...nel.org, hch@....de, sagi@...mberg.me,
jejb@...ux.ibm.com, martin.petersen@...cle.com, djwong@...nel.org,
viro@...iv.linux.org.uk, brauner@...nel.org, dchinner@...hat.com,
jack@...e.cz, linux-block@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-nvme@...ts.infradead.org,
linux-fsdevel@...r.kernel.org, tytso@....edu, jbongio@...gle.com,
linux-scsi@...r.kernel.org, ojaswin@...ux.ibm.com,
linux-aio@...ck.org, linux-btrfs@...r.kernel.org,
io-uring@...r.kernel.org, nilay@...ux.ibm.com,
ritesh.list@...il.com
Subject: Re: [PATCH v6 00/10] block atomic writes
On Wed, Apr 10, 2024 at 09:34:36AM +0100, John Garry wrote:
> On 08/04/2024 18:50, Luis Chamberlain wrote:
> > I agree that when you don't set the sector size to 16k you are not forcing the
> > filesystem to use 16k IOs, the metadata can still be 4k. But when you
> > use a 16k sector size, the 16k IOs should be respected by the
> > filesystem.
> >
> > Do we break BIOs to below a min order if the sector size is also set to
> > 16k? I haven't seen that and its unclear when or how that could happen.
>
> AFAICS, the only guarantee is to not split below LBS.
It would be odd to split a BIO given a inode requirement size spelled
out, but indeed I don't recall verifying this gaurantee.
> > At least for NVMe we don't need to yell to a device to inform it we want
> > a 16k IO issued to it to be atomic, if we read that it has the
> > capability for it, it just does it. The IO verificaiton can be done with
> > blkalgn [0].
> >
> > Does SCSI*require* an 16k atomic prep work, or can it be done implicitly?
> > Does it need WRITE_ATOMIC_16?
>
> physical block size is what we can implicitly write atomically.
Yes, and also on flash to avoid read modify writes.
> So if you
> have a 4K PBS and 512B LBS, then WRITE_ATOMIC_16 would be required to write
> 16KB atomically.
Ugh. Why does SCSI requires a special command for this?
Now we know what would be needed to bump the physical block size, it is
certainly a different feature, however I think it would be good to
evaluate that world too. For NVMe we don't have such special write
requirements.
I put together this kludge with the last patches series of LBS + the
bdev cache aops stuff (which as I said before needs an alternative
solution) and just the scsi atomics topology + physical block size
change to easily experiment to see what would break:
https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/log/?h=20240408-lbs-scsi-kludge
Using a larger sector size works but it does not use the special scsi
atomic write.
> > > To me, O_ATOMIC would be required for buffered atomic writes IO, as we want
> > > a fixed-sized IO, so that would mean no mixing of atomic and non-atomic IO.
> > Would using the same min and max order for the inode work instead?
>
> Maybe, I would need to check further.
I'd be happy to help review too.
Luis
Powered by blists - more mailing lists