[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPjX3FdP0k8YRR7vA+g_deUqEt=1wnRZXV-9gaTHwur43HVJ6w@mail.gmail.com>
Date: Thu, 12 Jun 2025 10:34:20 +0200
From: Daniel Vacek <neelx@...e.com>
To: dsterba@...e.cz
Cc: Chris Mason <clm@...com>, Josef Bacik <josef@...icpanda.com>, David Sterba <dsterba@...e.com>,
linux-btrfs@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] btrfs: index buffer_tree using node size
On Fri, 6 Jun 2025 at 19:28, David Sterba <dsterba@...e.cz> wrote:
>
> On Mon, May 12, 2025 at 07:23:20PM +0200, Daniel Vacek wrote:
> > So far we are deriving the buffer tree index using the sector size. But each
> > extent buffer covers multiple sectors. This makes the buffer tree rather sparse.
> >
> > For example the typical and quite common configuration uses sector size of 4KiB
> > and node size of 16KiB. In this case it means the buffer tree is using up to
> > the maximum of 25% of it's slots. Or in other words at least 75% of the tree
> > slots are wasted as never used.
> >
> > We can score significant memory savings on the required tree nodes by indexing
> > the tree using the node size instead. As a result far less slots are wasted
> > and the tree can now use up to all 100% of it's slots this way.
> >
> > Signed-off-by: Daniel Vacek <neelx@...e.com>
>
> This is a useful improvement, so we should continue and merge it. The
> performance improvements should be done so we get some idea. Runtime and
> slab savings.
Performance-wise this fix is not that significant. With my testing I
did not notice any changes in runtime performance.
Slab usage is _relatively_ significant though - about 2/3 of the btrfs
radix_tree_node objects are saved. Though in _absolute_ numbers
system-wide it's still within the level of noise. Without this fix
btrfs uses a bit over 1% of the radix_tree_node slab cache while with
the fix it falls below half a percent from what I was able to pull
from the memory.
> One coding comment, please rename node_bits to nodesize_bits so it's
> consistent with sectorsize and sectorsize_bits.
Right.
> > fs/btrfs/disk-io.c | 1 +
> > fs/btrfs/extent_io.c | 30 +++++++++++++++---------------
> > fs/btrfs/fs.h | 3 ++-
> > 3 files changed, 18 insertions(+), 16 deletions(-)
> >
> > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> > index 5bcf11246ba66..dcea5b0a2db50 100644
> > --- a/fs/btrfs/extent_io.c
> > +++ b/fs/btrfs/extent_io.c
> > @@ -4294,9 +4294,9 @@ static int try_release_subpage_extent_buffer(struct folio *folio)
> > {
> > struct btrfs_fs_info *fs_info = folio_to_fs_info(folio);
> > struct extent_buffer *eb;
> > - unsigned long start = folio_pos(folio) >> fs_info->sectorsize_bits;
> > + unsigned long start = folio_pos(folio) >> fs_info->node_bits;
> > unsigned long index = start;
> > - unsigned long end = index + (PAGE_SIZE >> fs_info->sectorsize_bits) - 1;
> > + unsigned long end = index + (PAGE_SIZE >> fs_info->node_bits) - 1;
>
> This looks a bit suspicious, page size is 4k node size can be 4k .. 64k.
> It's in subpage code so sector < page, the shift it's always >= 0. Node
> can be larger so the shift result would be 0 but as a result of 4k
> shifted by "16k".
This comes from commit 19d7f65f032f ("btrfs: convert the buffer_radix
to an xarray").
You can still have 64 KiB page with 16 KiB node size. So this still
looks sound to me.
> > int ret;
> >
> > xa_lock_irq(&fs_info->buffer_tree);
> > diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
> > index cf805b4032af3..8c9113304fabe 100644
> > --- a/fs/btrfs/fs.h
> > +++ b/fs/btrfs/fs.h
> > @@ -778,8 +778,9 @@ struct btrfs_fs_info {
> >
> > struct btrfs_delayed_root *delayed_root;
> >
> > - /* Entries are eb->start / sectorsize */
> > + /* Entries are eb->start >> node_bits */
> > struct xarray buffer_tree;
> > + int node_bits;
>
> u32 and pleas move it to nodesize.
Sure.
Powered by blists - more mailing lists