[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aRchGBJA1ExoGi8W@redhat.com>
Date: Fri, 14 Nov 2025 13:31:20 +0100
From: Kevin Wolf <kwolf@...hat.com>
To: Christoph Hellwig <hch@....de>
Cc: Jan Kara <jack@...e.cz>, Keith Busch <kbusch@...nel.org>,
Dave Chinner <david@...morbit.com>,
Carlos Maiolino <cem@...nel.org>,
Christian Brauner <brauner@...nel.org>,
"Martin K. Petersen" <martin.petersen@...cle.com>,
linux-kernel@...r.kernel.org, linux-xfs@...r.kernel.org,
linux-fsdevel@...r.kernel.org, linux-raid@...r.kernel.org,
linux-block@...r.kernel.org
Subject: Re: fall back from direct to buffered I/O when stable writes are
required
Am 14.11.2025 um 13:01 hat Christoph Hellwig geschrieben:
> On Fri, Nov 14, 2025 at 10:29:39AM +0100, Kevin Wolf wrote:
> > Right, but since this is direct I/O and the approach with only declaring
> > I/O from the page cache safe without a bounce buffer means that RAID has
> > to use a bounce buffer here anyway (with or without PI), doesn't this
> > automatically solve it?
> >
> > So if it's only PI, it's the problem of userspace, and if you add RAID
> > on top, then the normal rules for RAID apply. (And that the buffer
> > doesn't get modified and PI doesn't become invalid until RAID does its
> > thing is still a userspace problem.)
>
> Well, only if we have different levels of I/O stability guarantees:
>
> Level 0
> - trusted caller guarantees pages are stable (buffered I/O,
> in-kernel direct I/O callers that control the buffer)
>
> Level 1:
> - untrusted caller declares the pages are stable
> (direct I/O with PI)
>
> Level 2:
> - no one guarantees nothing
> (other direct I/O directly or indirectly fed from userspace)
>
> PI formatted devices would only bounce for 1, parity would bounce for
> 1 and 2. Software checksums could probably get away with only 1,
> although 2 would feel safer.
My main point above was that RAID and (potentially passed through) PI
are independent of each other and I think that's still true with or
without multiple stability levels.
If you don't have these levels, you just have to treat level 1 and 2 the
same, i.e. bounce all the time if the kernel needs the guarantee (which
is not for userspace PI, unless the same request needs the bounce buffer
for another reason in a different place like RAID). That might be less
optimal, but still correct and better than what happens today because at
least you don't bounce for level 0 any more.
If there is something you can optimise by delegating the responsibility
to userspace in some cases - like you can prove that only the
application itself would be harmed by doing things wrong - then having
level 1 separate could certainly be interesting. In this case, I'd
consider adding an RWF_* flag for userspace to make the promise even
outside PI passthrough. But while potentially worthwhile, it feels like
this is a separate optimisation from what you tried to address here.
Kevin
Powered by blists - more mailing lists