linux-kernel - Re: [PATCH] btrfs: remove extent buffer's redundant `len` member field

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAPjX3FfBoU9-wYP-JC63K6y8Pzocu0z8cKvXEbjD_NjdxWO+Og@mail.gmail.com>
Date: Wed, 30 Apr 2025 10:21:18 +0200
From: Daniel Vacek <neelx@...e.com>
To: dsterba@...e.cz
Cc: Chris Mason <clm@...com>, Josef Bacik <josef@...icpanda.com>, David Sterba <dsterba@...e.com>, 
	linux-btrfs@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] btrfs: remove extent buffer's redundant `len` member field

On Wed, 30 Apr 2025 at 10:03, David Sterba <dsterba@...e.cz> wrote:
>
> On Tue, Apr 29, 2025 at 05:17:57PM +0200, Daniel Vacek wrote:
> > Even super block nowadays uses nodesize for eb->len. This is since commits
> >
> > 551561c34663 ("btrfs: don't pass nodesize to __alloc_extent_buffer()")
> > da17066c4047 ("btrfs: pull node/sector/stripe sizes out of root and into fs_info")
> > ce3e69847e3e ("btrfs: sink parameter len to alloc_extent_buffer")
> > a83fffb75d09 ("btrfs: sink blocksize parameter to btrfs_find_create_tree_block")
> >
> > With these the eb->len is not really useful anymore. Let's use the nodesize
> > directly where applicable.
>
> I've had this patch in my local branch for some years from the times we
> were optimizing extent buffer size. The size on release config is 240
> bytes. The goal was to get it under 256 and keep it aligned.
>
> Removing eb->len does not change the structure size and leaves a hole
>
>  struct extent_buffer {
>         u64                        start;                /*     0     8 */
> -       u32                        len;                  /*     8     4 */
> -       u32                        folio_size;           /*    12     4 */
> +       u32                        folio_size;           /*     8     4 */
> +
> +       /* XXX 4 bytes hole, try to pack */
> +
>         long unsigned int          bflags;               /*    16     8 */
>         struct btrfs_fs_info *     fs_info;              /*    24     8 */
>         void *                     addr;                 /*    32     8 */
> @@ -5554,8 +5556,8 @@ struct extent_buffer {
>         struct rw_semaphore        lock;                 /*    72    40 */
>         struct folio *             folios[16];           /*   112   128 */
>
> -       /* size: 240, cachelines: 4, members: 14 */
> -       /* sum members: 238, holes: 1, sum holes: 2 */
> +       /* size: 240, cachelines: 4, members: 13 */
> +       /* sum members: 234, holes: 2, sum holes: 6 */
>         /* forced alignments: 1, forced holes: 1, sum forced holes: 2 */
>         /* last cacheline: 48 bytes */
>  } __attribute__((__aligned__(8)));
>
> The benefit of duplicating the length in each eb is that it's in the
> same cacheline as the other members that are used for offset
> calculations or bit manipulations.
>
> Going to the fs_info->nodesize may or may not hit a cache, also because
> it needs to do 2 pointer dereferences, so from that perspective I think
> it's making it worse.

I was considering that. Since fs_info is shared for all ebs and other
stuff like transactions, etc. I think the cache is hot most of the
time and there will be hardly any performance difference observable.
Though without benchmarks this is just a speculation (on both sides).

> I don't think we need to do the optimization right now, but maybe in the
> future if there's a need to add something to eb. Still we can use the
> remaining 16 bytes up to 256 without making things worse.

This really depends on configuration. On my laptop (Debian -rt kernel)
the eb struct is actually 272 bytes as the rt_mutex is significantly
heavier than raw spin lock. And -rt is a first class citizen nowadays,
often used in Kubernetes deployments like 5G RAN telco, dpdk and such.
I think it would be nice to slim the struct below 256 bytes even there
if that's your aim.