[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260119195816.GA15583@frogsfrogsfrogs>
Date: Mon, 19 Jan 2026 11:58:16 -0800
From: "Darrick J. Wong" <djwong@...nel.org>
To: Eric Biggers <ebiggers@...nel.org>
Cc: Christoph Hellwig <hch@....de>,
Andrey Albershteyn <aalbersh@...hat.com>,
Matthew Wilcox <willy@...radead.org>, fsverity@...ts.linux.dev,
linux-xfs@...r.kernel.org, linux-fsdevel@...r.kernel.org,
aalbersh@...nel.org, david@...morbit.com, tytso@....edu,
linux-ext4@...r.kernel.org, jaegeuk@...nel.org, chao@...nel.org,
linux-f2fs-devel@...ts.sourceforge.net
Subject: Re: fsverity metadata offset, was: Re: [PATCH v2 0/23] fs-verity
support for XFS with post EOF merkle tree
On Mon, Jan 19, 2026 at 11:32:42AM -0800, Eric Biggers wrote:
> On Mon, Jan 19, 2026 at 07:33:49AM +0100, Christoph Hellwig wrote:
> > While looking at fsverity I'd like to understand the choise of offset
> > in ext4 and f2fs, and wonder about an issue.
> >
> > Both ext4 and f2fs round up the inode size to the next 64k boundary
> > and place the metadata there. Both use the 65536 magic number for that
> > instead of a well documented constant unfortunately.
> >
> > I assume this was picked to align up to the largest reasonable page
> > size? Unfortunately for that:
> >
> > a) not all architectures are reasonable. As Darrick pointed out
> > hexagon seems to support page size up to 1MiB. While I don't know
> > if they exist in real life, powerpc supports up to 256kiB pages,
> > and I know they are used for real in various embedded settings
They *did* way back in the day, I worked with some seekrit PPC440s early
in my career. I don't know that any of them still exist, but the code
is still there...
> > b) with large folio support in the page cache, the folios used to
> > map files can be much larger than the base page size, with all
> > the same issues as a larger page size
> >
> > So assuming that fsverity is trying to avoid the issue of a page/folio
> > that covers both data and fsverity metadata, how does it copy with that?
> > Do we need to disable fsverity on > 64k page size and disable large
> > folios on fsverity files? The latter would mean writing back all cached
> > data first as well.
> >
> > And going forward, should we have a v2 format that fixes this? For that
> > we'd still need a maximum folio size of course. And of course I'd like
> > to get all these things right from the start in XFS, while still being as
> > similar as possible to ext4/f2fs.
>
> Yes, if I recall correctly it was intended to be the "largest reasonable
> page size". It looks like PAGE_SIZE > 65536 can't work as-is, so indeed
> we should disable fsverity support in that configuration.
>
> I don't think large folios are quite as problematic.
> ext4_read_merkle_tree_page() and f2fs_read_merkle_tree_page() read a
> folio and return the appropriate page in it, and fs/verity/verify.c
> operates on the page. If it's a page in the folio that spans EOF, I
> think everything will actually still work, except userspace will be able
> to see Merkle tree data after a 64K boundary past EOF if the file is
> mmapped using huge pages.
We don't allow mmapping file data beyond the EOF basepage, even if the
underlying folio is a large folio. See generic/749, though recently
Kiryl Shutsemau tried to remove that restriction[1], until dchinner and
willy told him no.
> The mmap issue isn't great, but I'm not sure how much it matters,
> especially when the zeroes do still go up to a 64K boundary.
I'm concerned that post-eof zeroing of a 256k folio could accidentally
obliterate merkle tree content that was somehow previously loaded.
Though afaict from the existing codebases, none of them actually make
that mistake.
> If we do need to fix this, there are a couple things we could consider
> doing without changing the on-disk format in ext4 or f2fs: putting the
> data in the page cache at a different offset than it exists on-disk, or
> using "small" pages for EOF specifically.
I'd leave the ondisk offset as-is, but change the pagecache offset to
roundup(i_size_read(), mapping_max_folio_size_supported()) just to keep
file data and fsverity metadata completely separate.
> But yes, XFS should choose a larger alignment than 64K.
The roundup() formula above is what I'd choose for the pagecache offset
for xfs. The ondisk offset of 1<<53 is ok with me.
--D
[1] https://lore.kernel.org/linux-fsdevel/20251014175214.GW6188@frogsfrogsfrogs/
Powered by blists - more mailing lists