[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230405222646.GR3223426@dread.disaster.area>
Date: Thu, 6 Apr 2023 08:26:46 +1000
From: Dave Chinner <david@...morbit.com>
To: Eric Biggers <ebiggers@...nel.org>
Cc: "Darrick J. Wong" <djwong@...nel.org>,
Andrey Albershteyn <aalbersh@...hat.com>, dchinner@...hat.com,
hch@...radead.org, linux-xfs@...r.kernel.org,
fsverity@...ts.linux.dev, rpeterso@...hat.com, agruenba@...hat.com,
xiang@...nel.org, chao@...nel.org,
damien.lemoal@...nsource.wdc.com, jth@...nel.org,
linux-erofs@...ts.ozlabs.org, linux-btrfs@...r.kernel.org,
linux-ext4@...r.kernel.org, linux-f2fs-devel@...ts.sourceforge.net,
cluster-devel@...hat.com
Subject: Re: [PATCH v2 21/23] xfs: handle merkle tree block size != fs
blocksize != PAGE_SIZE
On Wed, Apr 05, 2023 at 06:16:00PM +0000, Eric Biggers wrote:
> On Wed, Apr 05, 2023 at 09:38:47AM -0700, Darrick J. Wong wrote:
> > > The merkle tree pages are dropped after verification. When page is
> > > dropped xfs_buf is marked as verified. If fs-verity wants to
> > > verify again it will get the same verified buffer. If buffer is
> > > evicted it won't have verified state.
> > >
> > > So, with enough memory pressure buffers will be dropped and need to
> > > be reverified.
> >
> > Please excuse me if this was discussed and rejected long ago, but
> > perhaps fsverity should try to hang on to the merkle tree pages that
> > this function returns for as long as possible until reclaim comes for
> > them?
> >
> > With the merkle tree page lifetimes extended, you then don't need to
> > attach the xfs_buf to page->private, nor does xfs have to extend the
> > buffer cache to stash XBF_VERITY_CHECKED.
>
> Well, all the other filesystems that support fsverity (ext4, f2fs, and btrfs)
> just cache the Merkle tree pages in the inode's page cache. It's an approach
> that I know some people aren't a fan of, but it's efficient and it works.
Which puts pages beyond EOF in the page cache. Given that XFS also
allows persistent block allocation beyond EOF, having both data in the page
cache and blocks beyond EOF that contain unrelated information is a
Real Bad Idea.
Just because putting metadata in the file data address space works
for one filesystem, it doesn't me it's a good idea or that it works
for every filesystem.
> We could certainly think about moving to a design where fs/verity/ asks the
> filesystem to just *read* a Merkle tree block, without adding it to a cache, and
> then fs/verity/ implements the caching itself. That would require some large
> changes to each filesystem, though, unless we were to double-cache the Merkle
> tree blocks which would be inefficient.
No, that's unnecessary.
All we need if for fsverity to require filesystems to pass it byte
addressable data buffers that are externally reference counted. The
filesystem can take a page reference before mapping the page and
passing the kaddr to fsverity, then unmap and drop the reference
when the merkle tree walk is done as per Andrey's new drop callout.
fsverity doesn't need to care what the buffer is made from, how it
is cached, what it's life cycle is, etc. The caching mechanism and
reference counting is entirely controlled by the filesystem callout
implementations, and fsverity only needs to deal with memory buffers
that are guaranteed to live for the entire walk of the merkle
tree....
Cheers,
Dave.
--
Dave Chinner
david@...morbit.com
Powered by blists - more mailing lists