[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230405151234.sgkuasb7lwmgetzz@aalbersh.remote.csb>
Date: Wed, 5 Apr 2023 17:12:34 +0200
From: Andrey Albershteyn <aalbersh@...hat.com>
To: Eric Biggers <ebiggers@...nel.org>
Cc: djwong@...nel.org, dchinner@...hat.com, hch@...radead.org,
linux-xfs@...r.kernel.org, fsverity@...ts.linux.dev,
rpeterso@...hat.com, agruenba@...hat.com, xiang@...nel.org,
chao@...nel.org, damien.lemoal@...nsource.wdc.com, jth@...nel.org,
linux-erofs@...ts.ozlabs.org, linux-btrfs@...r.kernel.org,
linux-ext4@...r.kernel.org, linux-f2fs-devel@...ts.sourceforge.net,
cluster-devel@...hat.com
Subject: Re: [PATCH v2 21/23] xfs: handle merkle tree block size != fs
blocksize != PAGE_SIZE
Hi Eric,
On Tue, Apr 04, 2023 at 04:32:24PM -0700, Eric Biggers wrote:
> Hi Andrey,
>
> On Tue, Apr 04, 2023 at 04:53:17PM +0200, Andrey Albershteyn wrote:
> > In case of different Merkle tree block size fs-verity expects
> > ->read_merkle_tree_page() to return Merkle tree page filled with
> > Merkle tree blocks. The XFS stores each merkle tree block under
> > extended attribute. Those attributes are addressed by block offset
> > into Merkle tree.
> >
> > This patch make ->read_merkle_tree_page() to fetch multiple merkle
> > tree blocks based on size ratio. Also the reference to each xfs_buf
> > is passed with page->private to ->drop_page().
> >
> > Signed-off-by: Andrey Albershteyn <aalbersh@...hat.com>
> > ---
> > fs/xfs/xfs_verity.c | 74 +++++++++++++++++++++++++++++++++++----------
> > fs/xfs/xfs_verity.h | 8 +++++
> > 2 files changed, 66 insertions(+), 16 deletions(-)
> >
> > diff --git a/fs/xfs/xfs_verity.c b/fs/xfs/xfs_verity.c
> > index a9874ff4efcd..ef0aff216f06 100644
> > --- a/fs/xfs/xfs_verity.c
> > +++ b/fs/xfs/xfs_verity.c
> > @@ -134,6 +134,10 @@ xfs_read_merkle_tree_page(
> > struct page *page = NULL;
> > __be64 name = cpu_to_be64(index << PAGE_SHIFT);
> > uint32_t bs = 1 << log_blocksize;
> > + int blocks_per_page =
> > + (1 << (PAGE_SHIFT - log_blocksize));
> > + int n = 0;
> > + int offset = 0;
> > struct xfs_da_args args = {
> > .dp = ip,
> > .attr_filter = XFS_ATTR_VERITY,
> > @@ -143,26 +147,59 @@ xfs_read_merkle_tree_page(
> > .valuelen = bs,
> > };
> > int error = 0;
> > + bool is_checked = true;
> > + struct xfs_verity_buf_list *buf_list;
> >
> > page = alloc_page(GFP_KERNEL);
> > if (!page)
> > return ERR_PTR(-ENOMEM);
> >
> > - error = xfs_attr_get(&args);
> > - if (error) {
> > - kmem_free(args.value);
> > - xfs_buf_rele(args.bp);
> > + buf_list = kzalloc(sizeof(struct xfs_verity_buf_list), GFP_KERNEL);
> > + if (!buf_list) {
> > put_page(page);
> > - return ERR_PTR(-EFAULT);
> > + return ERR_PTR(-ENOMEM);
> > }
> >
> > - if (args.bp->b_flags & XBF_VERITY_CHECKED)
> > + /*
> > + * Fill the page with Merkle tree blocks. The blcoks_per_page is higher
> > + * than 1 when fs block size != PAGE_SIZE or Merkle tree block size !=
> > + * PAGE SIZE
> > + */
> > + for (n = 0; n < blocks_per_page; n++) {
> > + offset = bs * n;
> > + name = cpu_to_be64(((index << PAGE_SHIFT) + offset));
> > + args.name = (const uint8_t *)&name;
> > +
> > + error = xfs_attr_get(&args);
> > + if (error) {
> > + kmem_free(args.value);
> > + /*
> > + * No more Merkle tree blocks (e.g. this was the last
> > + * block of the tree)
> > + */
> > + if (error == -ENOATTR)
> > + break;
> > + xfs_buf_rele(args.bp);
> > + put_page(page);
> > + kmem_free(buf_list);
> > + return ERR_PTR(-EFAULT);
> > + }
> > +
> > + buf_list->bufs[buf_list->buf_count++] = args.bp;
> > +
> > + /* One of the buffers was dropped */
> > + if (!(args.bp->b_flags & XBF_VERITY_CHECKED))
> > + is_checked = false;
> > +
> > + memcpy(page_address(page) + offset, args.value, args.valuelen);
> > + kmem_free(args.value);
> > + args.value = NULL;
> > + }
>
> I was really hoping for a solution where the cached data can be used directly,
> instead allocating a temporary page and copying the cached data into it every
> time the cache is accessed. The problem with what you have now is that every
> time a single 32-byte hash is accessed, a full page (potentially 64KB!) will be
> allocated and filled. That's not very efficient. The need to allocate the
> temporary page can also cause ENOMEM (which will get reported as EIO).
>
> Did you consider alternatives that would work more efficiently? I think it
> would be worth designing something that works properly with how XFS is planned
> to cache the Merkle tree, instead of designing a workaround.
> ->read_merkle_tree_page was not really designed for what you are doing here.
>
> How about replacing ->read_merkle_tree_page with a function that takes in a
> Merkle tree block index (not a page index!) and hands back a (page, offset) pair
> that identifies where the Merkle tree block's data is located? Or (folio,
> offset), I suppose.
>
> With that, would it be possible to directly return the XFS cache?
>
> - Eric
>
Yeah, I also don't like it, I didn't want to change fs-verity much
so went with this workaround. But as it's ok, I will look into trying
to pass xfs buffers to fs-verity without direct use of
->read_merkle_tree_page(). I think it's possible with (folio,
offset), the xfs buffers aren't xattr value align so the 4k merkle
tree block is stored in two pages.
Thanks for suggestion!
--
- Andrey
Powered by blists - more mailing lists