[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAL3q7H5EA=NUOPwtgUNUMhOhGd85pSgsgy17KwBOWgWV9594+Q@mail.gmail.com>
Date: Wed, 3 Dec 2025 10:45:17 +0000
From: Filipe Manana <fdmanana@...nel.org>
To: Leo Martins <loemra.dev@...il.com>
Cc: Oliver Sang <oliver.sang@...el.com>, oe-lkp@...ts.linux.dev, lkp@...el.com,
linux-kernel@...r.kernel.org, David Sterba <dsterba@...e.com>,
linux-btrfs@...r.kernel.org
Subject: Re: [linus:master] [btrfs] e8513c012d: addition_on#;use-after-free
On Tue, Dec 2, 2025 at 9:03 PM Leo Martins <loemra.dev@...il.com> wrote:
>
> On Tue, 2 Dec 2025 19:19:05 +0000 Filipe Manana <fdmanana@...nel.org> wrote:
>
> > On Tue, Dec 2, 2025 at 5:17 PM Leo Martins <loemra.dev@...il.com> wrote:
> > >
> > > On Tue, 2 Dec 2025 15:04:51 +0000 Filipe Manana <fdmanana@...nel.org> wrote:
> > >
> > > > On Tue, Dec 2, 2025 at 8:40 AM Oliver Sang <oliver.sang@...el.com> wrote:
> > > > >
> > > > > hi, Leo Martins,
> > > > >
> > > > > On Mon, Dec 01, 2025 at 04:51:41PM -0800, Leo Martins wrote:
> > > > >
> > > > > [...]
> > > > >
> > > > > >
> > > > > > Hello,
> > > > > >
> > > > > > I believe I have identified the root cause of the warning.
> > > > > > However, I'm having some troubles running the reproducer as I
> > > > > > haven't setup lkp-tests yet. Could you test the patch below
> > > > > > against your reproducer to see if it fixes the issue?
> > > > >
> > > > > we confirmed your patch fixed the issues we reported in origial report. thanks!
> > > > >
> > > > > Tested-by: kernel test robot <oliver.sang@...el.com>
> > > > >
> > > > > >
> > > > > > ---8<---
> > > > > >
> > > > > > [PATCH] btrfs: fix use-after-free in btrfs_get_or_create_delayed_node
> > > > > >
> > > > > > Previously, btrfs_get_or_create_delayed_node sets the delayed_node's
> > > > > > refcount before acquiring the root->delayed_nodes lock.
> > > > > > Commit e8513c012de7 ("btrfs: implement ref_tracker for delayed_nodes")
> > > > > > moves refcount_set inside the critical section which means
> > > > > > there is no longer a memory barrier between setting the refcount and
> > > > > > setting btrfs_inode->delayed_node = node.
> > > > > >
> > > > > > This allows btrfs_get_or_create_delayed_node to set
> > > > > > btrfs_inode->delayed_node before setting the refcount.
> > > > > > A different thread is then able to read and increase the refcount
> > > > > > of btrfs_inode->delayed_node leading to a refcounting bug and
> > > > > > a use-after-free warning.
> > > > > >
> > > > > > The fix is to move refcount_set back to where it was to take
> > > > > > advantage of the implicit memory barrier provided by lock
> > > > > > acquisition.
> > > > > >
> > > > > > Fixes: e8513c012de7 ("btrfs: implement ref_tracker for delayed_nodes")
> > > > > > Reported-by: kernel test robot <oliver.sang@...el.com>
> > > > > > Closes: https://lore.kernel.org/oe-lkp/202511262228.6dda231e-lkp@intel.com
> > > > > > Signed-off-by: Leo Martins <loemra.dev@...il.com>
> > > > > > ---
> > > > > > fs/btrfs/delayed-inode.c | 34 ++++++++++++++++++----------------
> > > > > > 1 file changed, 18 insertions(+), 16 deletions(-)
> > > > > >
> > > > > > diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
> > > > > > index 364814642a91..f61f10000e33 100644
> > > > > > --- a/fs/btrfs/delayed-inode.c
> > > > > > +++ b/fs/btrfs/delayed-inode.c
> > > > > > @@ -152,37 +152,39 @@ static struct btrfs_delayed_node *btrfs_get_or_create_delayed_node(
> > > > > > return ERR_PTR(-ENOMEM);
> > > > > > btrfs_init_delayed_node(node, root, ino);
> > > > > >
> > > > > > + /* Cached in the inode and can be accessed. */
> > > > > > + refcount_set(&node->refs, 2);
> > > > > > + btrfs_delayed_node_ref_tracker_alloc(node, tracker, GFP_ATOMIC);
> > > > > > + btrfs_delayed_node_ref_tracker_alloc(node, &node->inode_cache_tracker, GFP_ATOMIC);
> > > > > > +
> > > > > > /* Allocate and reserve the slot, from now it can return a NULL from xa_load(). */
> > > > > > ret = xa_reserve(&root->delayed_nodes, ino, GFP_NOFS);
> > > > > > - if (ret == -ENOMEM) {
> > > > > > - btrfs_delayed_node_ref_tracker_dir_exit(node);
> > > > > > - kmem_cache_free(delayed_node_cache, node);
> > > > > > - return ERR_PTR(-ENOMEM);
> > > > > > - }
> > > > > > + if (ret == -ENOMEM)
> > > > > > + goto cleanup;
> > > > > > +
> > > > > > xa_lock(&root->delayed_nodes);
> > > > > > ptr = xa_load(&root->delayed_nodes, ino);
> > > > > > if (ptr) {
> > > > > > /* Somebody inserted it, go back and read it. */
> > > > > > xa_unlock(&root->delayed_nodes);
> > > > > > - btrfs_delayed_node_ref_tracker_dir_exit(node);
> > > > > > - kmem_cache_free(delayed_node_cache, node);
> > > > > > - node = NULL;
> > > > > > - goto again;
> > > > > > + goto cleanup;
> > > > > > }
> > > > > > ptr = __xa_store(&root->delayed_nodes, ino, node, GFP_ATOMIC);
> > > > > > ASSERT(xa_err(ptr) != -EINVAL);
> > > > > > ASSERT(xa_err(ptr) != -ENOMEM);
> > > > > > ASSERT(ptr == NULL);
> > > > > > -
> > > > > > - /* Cached in the inode and can be accessed. */
> > > > > > - refcount_set(&node->refs, 2);
> > > > > > - btrfs_delayed_node_ref_tracker_alloc(node, tracker, GFP_ATOMIC);
> > > > > > - btrfs_delayed_node_ref_tracker_alloc(node, &node->inode_cache_tracker, GFP_ATOMIC);
> > > > > > -
> > > > > > - btrfs_inode->delayed_node = node;
> > > > > > + WRITE_ONCE(btrfs_inode->delayed_node, node);
> > > >
> > > > Why the WRITE_ONCE() change?
> > >
> > > Since there are lockless readers of btrfs_inode->delayed_node all writers
> > > should be marked with WRITE_ONCE to force the compiler to store atomically.
> >
> > If by atomically you mean to avoid store/load tearing, then using the
> > _ONCE() macros won't do anything because we are dealing with pointers.
> > This has been discussed in the past, see:
> >
> > https://lore.kernel.org/linux-btrfs/cover.1715951291.git.fdmanana@suse.com/
>
> That is what I meant. Missed that discussion, thanks for the link.
> I do still see some value in using WRITE_ONCE which is for the human
> reader to realize that there are lockless readers, but that's pretty
> minor.
No and that's mentioned in the thread: using _ONCE() when it doesn't
offer any protection but to signal someone reading the source code
that it can be accessed in a lockless way is only confusing people and
influencing people to repeat this pattern.
To make it clear that it's accessed in a lockless way, just make the
reader side use data_race() if it's a safe race or smp_load_acquire()
otherwise, with proper comments right above the read access. These
also make KCSAN and other tools not report possible races.
> >
> > >
> > > >
> > > > Can you explain in the changelog why it's being introduced?
> > > > This seems unrelated and it was not there before the commit mentioned
> > > > in the Fixes tag.
> > >
> > > I'll send out a v2 without the WRITE_ONCE since it is not directly related
> > > to this bug and send out a separate patch updating writes to use WRITE_ONCE.
> > >
> > > Thanks.
> > >
> > > >
> > > > Thanks.
> > > >
> > > > > > xa_unlock(&root->delayed_nodes);
> > > > > >
> > > > > > return node;
> > > > > > +cleanup:
> > > > > > + btrfs_delayed_node_ref_tracker_free(node, tracker);
> > > > > > + btrfs_delayed_node_ref_tracker_free(node, &node->inode_cache_tracker);
> > > > > > + btrfs_delayed_node_ref_tracker_dir_exit(node);
> > > > > > + kmem_cache_free(delayed_node_cache, node);
> > > > > > + if (ret)
> > > > > > + return ERR_PTR(ret);
> > > > > > + goto again;
> > > > > > }
> > > > > >
> > > > > > /*
> > > > > > --
> > > > > > 2.47.3
> > > > >
> > >
Powered by blists - more mailing lists