[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAL3q7H66AOc_hbXX_PN-DGP5fT36NnxE7p4j2LqjPXyRaOu=iA@mail.gmail.com>
Date: Tue, 30 Sep 2025 17:07:47 +0100
From: Filipe Manana <fdmanana@...nel.org>
To: Miquel Sabaté Solà <mssola@...ola.com>
Cc: linux-btrfs@...r.kernel.org, clm@...com, dsterba@...e.com,
fdmanana@...e.com, wqu@...e.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] fs: btrfs: prevent a double kfree on delayed-ref
On Tue, Sep 30, 2025 at 2:05 PM Miquel Sabaté Solà <mssola@...ola.com> wrote:
>
> In the previous code it was possible to incur into a double kfree()
> scenario when calling 'add_delayed_ref_head'. This could happen if the
> record was reported to already exist in the
> 'btrfs_qgroup_trace_extent_nolock' call, but then there was an error
> later on 'add_delayed_ref_head'. In this case, since
> 'add_delayed_ref_head' returned an error, the caller went to free the
> record. Since 'add_delayed_ref_head' couldn't set this kfree'd pointer
> to NULL, then kfree() would have acted on a non-NULL 'record' object
> which was pointing to memory already freed by the callee.
>
> The problem comes from the fact that the responsibility to kfree the
> object is on both the caller and the callee at the same time. Hence, the
> fix for this is to shift the ownership of the 'qrecord' object out of
> the 'add_delayed_ref_head'. That is, we will never attempt to kfree()
> the given object inside of this function, and will expect the caller to
> act on the 'qrecord' object on its own. The only exception where the
> 'qrecord' object cannot be kfree'd is if it was inserted into the
> tracing logic, for which we already have the 'qrecord_inserted_ret'
> boolean to account for this. Hence, the caller has to kfree the object
> only if 'add_delayed_ref_head' reports not to have inserted it on the
> tracing logic.
>
> As a side-effect of the above, we must guarantee that
> 'qrecord_inserted_ret' is properly initialized at the start of the
> function, not at the end, and then set when an actual insert
> happens. This way we avoid 'qrecord_inserted_ret' having an invalid
> value on an early exit.
>
> The documentation from the 'add_delayed_ref_head' has also been updated
> to reflect on the exact ownership of the 'qrecord' object.
>
> Fixes: 6ef8fbce0104 ("btrfs: fix missing error handling when adding delayed ref with qgroups enabled")
> Signed-off-by: Miquel Sabaté Solà <mssola@...ola.com>
> ---
> fs/btrfs/delayed-ref.c | 39 +++++++++++++++++++++++++++++++--------
> 1 file changed, 31 insertions(+), 8 deletions(-)
>
> diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
> index 481802efaa14..bc61e0eacc69 100644
> --- a/fs/btrfs/delayed-ref.c
> +++ b/fs/btrfs/delayed-ref.c
> @@ -798,10 +798,14 @@ static void init_delayed_ref_head(struct btrfs_delayed_ref_head *head_ref,
> }
>
> /*
> - * helper function to actually insert a head node into the rbtree.
> + * Helper function to actually insert a head node into the rbtree.
Since you are updating this line just to capitalize the first word,
you might as well replace rbtree with xarray as we don't use rbtree
anymore.
> * this does all the dirty work in terms of maintaining the correct
> * overall modification count.
> *
> + * The caller is responsible for calling kfree() on @qrecord. More specifically,
> + * if this function reports that it did not insert it as noted in
> + * @qrecord_inserted_ret, then it's safe to call kfree() on it.
> + *
> * Returns an error pointer in case of an error.
> */
> static noinline struct btrfs_delayed_ref_head *
> @@ -814,7 +818,14 @@ add_delayed_ref_head(struct btrfs_trans_handle *trans,
> struct btrfs_delayed_ref_head *existing;
> struct btrfs_delayed_ref_root *delayed_refs;
> const unsigned long index = (head_ref->bytenr >> fs_info->sectorsize_bits);
> - bool qrecord_inserted = false;
> +
> + /*
> + * If 'qrecord_inserted_ret' is provided, then the first thing we need
> + * to do is to initialize it to false just in case we have an exit
> + * before trying to insert the record.
> + */
> + if (qrecord_inserted_ret)
> + *qrecord_inserted_ret = false;
>
> delayed_refs = &trans->transaction->delayed_refs;
> lockdep_assert_held(&delayed_refs->lock);
> @@ -833,6 +844,12 @@ add_delayed_ref_head(struct btrfs_trans_handle *trans,
>
> /* Record qgroup extent info if provided */
> if (qrecord) {
> + /*
> + * Setting 'qrecord' but not 'qrecord_inserted_ret' will likely
> + * result in a memory leakage.
> + */
> + WARN_ON(!qrecord_inserted_ret);
For this sort of mandatory stuff, we use assertions, not warnings:
ASSERT(qrecord_insert_ret != NULL);
> +
> int ret;
>
> ret = btrfs_qgroup_trace_extent_nolock(fs_info, delayed_refs, qrecord,
> @@ -840,12 +857,10 @@ add_delayed_ref_head(struct btrfs_trans_handle *trans,
> if (ret) {
> /* Clean up if insertion fails or item exists. */
> xa_release(&delayed_refs->dirty_extents, index);
> - /* Caller responsible for freeing qrecord on error. */
> if (ret < 0)
> return ERR_PTR(ret);
> - kfree(qrecord);
> - } else {
> - qrecord_inserted = true;
> + } else if (qrecord_inserted_ret) {
> + *qrecord_inserted_ret = true;
> }
> }
>
> @@ -888,8 +903,6 @@ add_delayed_ref_head(struct btrfs_trans_handle *trans,
> delayed_refs->num_heads++;
> delayed_refs->num_heads_ready++;
> }
> - if (qrecord_inserted_ret)
> - *qrecord_inserted_ret = qrecord_inserted;
>
> return head_ref;
> }
> @@ -1049,6 +1062,14 @@ static int add_delayed_ref(struct btrfs_trans_handle *trans,
> xa_release(&delayed_refs->head_refs, index);
> spin_unlock(&delayed_refs->lock);
> ret = PTR_ERR(new_head_ref);
> +
> + /*
> + * It's only safe to call kfree() on 'qrecord' if
> + * 'add_delayed_ref_head' has _not_ inserted it for
The notation we use for function names is function_name(), not 'function_name'.
Otherwise it looks good, thanks.
> + * tracing. Otherwise we need to handle this here.
> + */
> + if (!qrecord_reserved || qrecord_inserted)
> + goto free_head_ref;
> goto free_record;
> }
> head_ref = new_head_ref;
> @@ -1071,6 +1092,8 @@ static int add_delayed_ref(struct btrfs_trans_handle *trans,
>
> if (qrecord_inserted)
> return btrfs_qgroup_trace_extent_post(trans, record, generic_ref->bytenr);
> +
> + kfree(record);
> return 0;
>
> free_record:
> --
> 2.51.0
>
>
Powered by blists - more mailing lists