lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAL3q7H4ijVSVZXahhN85D0LBSAJZPhP1-sqqFQXasdQwfjnyuQ@mail.gmail.com>
Date: Tue, 30 Sep 2025 17:12:11 +0100
From: Filipe Manana <fdmanana@...nel.org>
To: Miquel Sabaté Solà <mssola@...ola.com>
Cc: linux-btrfs@...r.kernel.org, clm@...com, dsterba@...e.com, 
	fdmanana@...e.com, wqu@...e.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] fs: btrfs: prevent a double kfree on delayed-ref

On Tue, Sep 30, 2025 at 5:07 PM Filipe Manana <fdmanana@...nel.org> wrote:
>
> On Tue, Sep 30, 2025 at 2:05 PM Miquel Sabaté Solà <mssola@...ola.com> wrote:
> >
> > In the previous code it was possible to incur into a double kfree()
> > scenario when calling 'add_delayed_ref_head'. This could happen if the
> > record was reported to already exist in the
> > 'btrfs_qgroup_trace_extent_nolock' call, but then there was an error
> > later on 'add_delayed_ref_head'. In this case, since
> > 'add_delayed_ref_head' returned an error, the caller went to free the
> > record. Since 'add_delayed_ref_head' couldn't set this kfree'd pointer
> > to NULL, then kfree() would have acted on a non-NULL 'record' object
> > which was pointing to memory already freed by the callee.
> >
> > The problem comes from the fact that the responsibility to kfree the
> > object is on both the caller and the callee at the same time. Hence, the
> > fix for this is to shift the ownership of the 'qrecord' object out of
> > the 'add_delayed_ref_head'. That is, we will never attempt to kfree()
> > the given object inside of this function, and will expect the caller to
> > act on the 'qrecord' object on its own. The only exception where the
> > 'qrecord' object cannot be kfree'd is if it was inserted into the
> > tracing logic, for which we already have the 'qrecord_inserted_ret'
> > boolean to account for this. Hence, the caller has to kfree the object
> > only if 'add_delayed_ref_head' reports not to have inserted it on the
> > tracing logic.
> >
> > As a side-effect of the above, we must guarantee that
> > 'qrecord_inserted_ret' is properly initialized at the start of the
> > function, not at the end, and then set when an actual insert
> > happens. This way we avoid 'qrecord_inserted_ret' having an invalid
> > value on an early exit.
> >
> > The documentation from the 'add_delayed_ref_head' has also been updated
> > to reflect on the exact ownership of the 'qrecord' object.
> >
> > Fixes: 6ef8fbce0104 ("btrfs: fix missing error handling when adding delayed ref with qgroups enabled")
> > Signed-off-by: Miquel Sabaté Solà <mssola@...ola.com>
> > ---
> >  fs/btrfs/delayed-ref.c | 39 +++++++++++++++++++++++++++++++--------
> >  1 file changed, 31 insertions(+), 8 deletions(-)
> >
> > diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
> > index 481802efaa14..bc61e0eacc69 100644
> > --- a/fs/btrfs/delayed-ref.c
> > +++ b/fs/btrfs/delayed-ref.c
> > @@ -798,10 +798,14 @@ static void init_delayed_ref_head(struct btrfs_delayed_ref_head *head_ref,
> >  }
> >
> >  /*
> > - * helper function to actually insert a head node into the rbtree.
> > + * Helper function to actually insert a head node into the rbtree.
>
> Since you are updating this line just to capitalize the first word,
> you might as well replace rbtree with xarray as we don't use rbtree
> anymore.
>
> >   * this does all the dirty work in terms of maintaining the correct
> >   * overall modification count.
> >   *
> > + * The caller is responsible for calling kfree() on @qrecord. More specifically,
> > + * if this function reports that it did not insert it as noted in
> > + * @qrecord_inserted_ret, then it's safe to call kfree() on it.
> > + *
> >   * Returns an error pointer in case of an error.
> >   */
> >  static noinline struct btrfs_delayed_ref_head *
> > @@ -814,7 +818,14 @@ add_delayed_ref_head(struct btrfs_trans_handle *trans,
> >         struct btrfs_delayed_ref_head *existing;
> >         struct btrfs_delayed_ref_root *delayed_refs;
> >         const unsigned long index = (head_ref->bytenr >> fs_info->sectorsize_bits);
> > -       bool qrecord_inserted = false;
> > +
> > +       /*
> > +        * If 'qrecord_inserted_ret' is provided, then the first thing we need
> > +        * to do is to initialize it to false just in case we have an exit
> > +        * before trying to insert the record.
> > +        */
> > +       if (qrecord_inserted_ret)
> > +               *qrecord_inserted_ret = false;
> >
> >         delayed_refs = &trans->transaction->delayed_refs;
> >         lockdep_assert_held(&delayed_refs->lock);
> > @@ -833,6 +844,12 @@ add_delayed_ref_head(struct btrfs_trans_handle *trans,
> >
> >         /* Record qgroup extent info if provided */
> >         if (qrecord) {
> > +               /*
> > +                * Setting 'qrecord' but not 'qrecord_inserted_ret' will likely
> > +                * result in a memory leakage.
> > +                */
> > +               WARN_ON(!qrecord_inserted_ret);
>
> For this sort of mandatory stuff, we use assertions, not warnings:
>
> ASSERT(qrecord_insert_ret != NULL);
>
>
> > +
> >                 int ret;
> >
> >                 ret = btrfs_qgroup_trace_extent_nolock(fs_info, delayed_refs, qrecord,
> > @@ -840,12 +857,10 @@ add_delayed_ref_head(struct btrfs_trans_handle *trans,
> >                 if (ret) {
> >                         /* Clean up if insertion fails or item exists. */
> >                         xa_release(&delayed_refs->dirty_extents, index);
> > -                       /* Caller responsible for freeing qrecord on error. */
> >                         if (ret < 0)
> >                                 return ERR_PTR(ret);
> > -                       kfree(qrecord);
> > -               } else {
> > -                       qrecord_inserted = true;
> > +               } else if (qrecord_inserted_ret) {
> > +                       *qrecord_inserted_ret = true;
> >                 }
> >         }
> >
> > @@ -888,8 +903,6 @@ add_delayed_ref_head(struct btrfs_trans_handle *trans,
> >                 delayed_refs->num_heads++;
> >                 delayed_refs->num_heads_ready++;
> >         }
> > -       if (qrecord_inserted_ret)
> > -               *qrecord_inserted_ret = qrecord_inserted;
> >
> >         return head_ref;
> >  }
> > @@ -1049,6 +1062,14 @@ static int add_delayed_ref(struct btrfs_trans_handle *trans,
> >                 xa_release(&delayed_refs->head_refs, index);
> >                 spin_unlock(&delayed_refs->lock);
> >                 ret = PTR_ERR(new_head_ref);
> > +
> > +               /*
> > +                * It's only safe to call kfree() on 'qrecord' if
> > +                * 'add_delayed_ref_head' has _not_ inserted it for
>
> The notation we use for function names is  function_name(), not 'function_name'.
>
> Otherwise it looks good, thanks.

Also, the subject should just be "btrfs: ....", no need to add extra
"fs: " prefix - we never do that.
>
> > +                * tracing. Otherwise we need to handle this here.
> > +                */
> > +               if (!qrecord_reserved || qrecord_inserted)
> > +                       goto free_head_ref;
> >                 goto free_record;
> >         }
> >         head_ref = new_head_ref;
> > @@ -1071,6 +1092,8 @@ static int add_delayed_ref(struct btrfs_trans_handle *trans,
> >
> >         if (qrecord_inserted)
> >                 return btrfs_qgroup_trace_extent_post(trans, record, generic_ref->bytenr);
> > +
> > +       kfree(record);
> >         return 0;
> >
> >  free_record:
> > --
> > 2.51.0
> >
> >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ