[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ad50a249-008d-4d1d-b6d5-cc09f815bf31@oracle.com>
Date: Fri, 1 Dec 2023 19:06:30 +0000
From: John Garry <john.g.garry@...cle.com>
To: Matthew Wilcox <willy@...radead.org>
Cc: Dave Chinner <david@...morbit.com>, Ojaswin Mujoo
<ojaswin@...ux.ibm.com>,
linux-ext4@...r.kernel.org, Theodore Ts'o <tytso@....edu>,
Ritesh Harjani <ritesh.list@...il.com>, linux-kernel@...r.kernel.org,
"Darrick J . Wong" <djwong@...nel.org>, linux-block@...r.kernel.org,
linux-xfs@...r.kernel.org, linux-fsdevel@...r.kernel.org,
dchinner@...hat.com
Subject: Re: [RFC 1/7] iomap: Don't fall back to buffered write if the write
is atomic
On 01/12/2023 13:27, Matthew Wilcox wrote:
>> Sure, and I think that we need a better story for supporting buffered IO for
>> atomic writes.
>>
>> Currently we have:
>> - man pages tell us RWF_ATOMIC is only supported for direct IO
>> - statx gives atomic write unit min/max, not explicitly telling us it's for
>> direct IO
>> - RWF_ATOMIC is ignored for !O_DIRECT
>>
>> So I am thinking of expanding statx support to enable querying of atomic
>> write capabilities for buffered IO and direct IO separately.
> Or ... we could support RWF_ATOMIC in the page cache?
>
> I haven't particularly been following the atomic writes patchset,
Some background is that we are focused on direct IO as the database
applications we're interested in use direct IO, but there are other DBs
which do not support direct IO (and want atomic write support).
> but
> for filesystems which support large folios, we now create large folios
> in the write path. I see four problems to solve:
>
> 1. We might already have a smaller folio in the page cache from an
> earlier access, We'd have to kick it out before creating a new folio
> that is the appropriate size.
Understood. Even though we give scope to do atomic writes of variable
size, we do expect applications to use a fixed size mostly. In addition,
typically we would expect only atomic or non-atomic writes. But what you
say would be possible.
>
> 2. We currently believe it's always OK to fall back to allocating smaller
> folios if memory allocation fails. We'd need to change that policy
> (which we need to modify anyway for the bs>PS support).
ok
>
> 3. We need to somewhere keep the information that writeback of this
> folio has to use the atomic commands. Maybe it becomes a per-inode
> flag so that all writeback from this inode now uses the atomic
> commands?
I'm not sure. Currently atomic writes are simply flagged per IO, and
per-inode atomic flags are something which we have avoided so far.
>
> 4. If somebody does a weird thing like truncate/holepunch into the
> middle of the folio, we need to define what we do. It's conceptually
> a bizarre thing to do, so I can't see any user actually wanting to
> do that ... but we need to define the semantics.
ok
>
> Maybe there are things I haven't thought of. And of course, some
> filesystems don't support large folios yet.
I may consider a PoC...
Thanks,
John
Powered by blists - more mailing lists