lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <87bjmraiov.fsf@>
Date: Tue, 30 Sep 2025 20:39:28 +0200
From: Miquel Sabaté Solà <mssola@...ola.com>
To: Filipe Manana <fdmanana@...nel.org>
Cc: linux-btrfs@...r.kernel.org,  clm@...com,  dsterba@...e.com,
  fdmanana@...e.com,  wqu@...e.com,  linux-kernel@...r.kernel.org
Subject: Re: [PATCH] fs: btrfs: prevent a double kfree on delayed-ref

Filipe Manana @ 2025-09-30 17:07 +01:

> On Tue, Sep 30, 2025 at 2:05 PM Miquel Sabaté Solà <mssola@...ola.com> wrote:
>>
>> In the previous code it was possible to incur into a double kfree()
>> scenario when calling 'add_delayed_ref_head'. This could happen if the
>> record was reported to already exist in the
>> 'btrfs_qgroup_trace_extent_nolock' call, but then there was an error
>> later on 'add_delayed_ref_head'. In this case, since
>> 'add_delayed_ref_head' returned an error, the caller went to free the
>> record. Since 'add_delayed_ref_head' couldn't set this kfree'd pointer
>> to NULL, then kfree() would have acted on a non-NULL 'record' object
>> which was pointing to memory already freed by the callee.
>>
>> The problem comes from the fact that the responsibility to kfree the
>> object is on both the caller and the callee at the same time. Hence, the
>> fix for this is to shift the ownership of the 'qrecord' object out of
>> the 'add_delayed_ref_head'. That is, we will never attempt to kfree()
>> the given object inside of this function, and will expect the caller to
>> act on the 'qrecord' object on its own. The only exception where the
>> 'qrecord' object cannot be kfree'd is if it was inserted into the
>> tracing logic, for which we already have the 'qrecord_inserted_ret'
>> boolean to account for this. Hence, the caller has to kfree the object
>> only if 'add_delayed_ref_head' reports not to have inserted it on the
>> tracing logic.
>>
>> As a side-effect of the above, we must guarantee that
>> 'qrecord_inserted_ret' is properly initialized at the start of the
>> function, not at the end, and then set when an actual insert
>> happens. This way we avoid 'qrecord_inserted_ret' having an invalid
>> value on an early exit.
>>
>> The documentation from the 'add_delayed_ref_head' has also been updated
>> to reflect on the exact ownership of the 'qrecord' object.
>>
>> Fixes: 6ef8fbce0104 ("btrfs: fix missing error handling when adding delayed ref with qgroups enabled")
>> Signed-off-by: Miquel Sabaté Solà <mssola@...ola.com>
>> ---
>>  fs/btrfs/delayed-ref.c | 39 +++++++++++++++++++++++++++++++--------
>>  1 file changed, 31 insertions(+), 8 deletions(-)
>>
>> diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
>> index 481802efaa14..bc61e0eacc69 100644
>> --- a/fs/btrfs/delayed-ref.c
>> +++ b/fs/btrfs/delayed-ref.c
>> @@ -798,10 +798,14 @@ static void init_delayed_ref_head(struct btrfs_delayed_ref_head *head_ref,
>>  }
>>
>>  /*
>> - * helper function to actually insert a head node into the rbtree.
>> + * Helper function to actually insert a head node into the rbtree.
>
> Since you are updating this line just to capitalize the first word,
> you might as well replace rbtree with xarray as we don't use rbtree
> anymore.
>
>>   * this does all the dirty work in terms of maintaining the correct
>>   * overall modification count.
>>   *
>> + * The caller is responsible for calling kfree() on @qrecord. More specifically,
>> + * if this function reports that it did not insert it as noted in
>> + * @qrecord_inserted_ret, then it's safe to call kfree() on it.
>> + *
>>   * Returns an error pointer in case of an error.
>>   */
>>  static noinline struct btrfs_delayed_ref_head *
>> @@ -814,7 +818,14 @@ add_delayed_ref_head(struct btrfs_trans_handle *trans,
>>         struct btrfs_delayed_ref_head *existing;
>>         struct btrfs_delayed_ref_root *delayed_refs;
>>         const unsigned long index = (head_ref->bytenr >> fs_info->sectorsize_bits);
>> -       bool qrecord_inserted = false;
>> +
>> +       /*
>> +        * If 'qrecord_inserted_ret' is provided, then the first thing we need
>> +        * to do is to initialize it to false just in case we have an exit
>> +        * before trying to insert the record.
>> +        */
>> +       if (qrecord_inserted_ret)
>> +               *qrecord_inserted_ret = false;
>>
>>         delayed_refs = &trans->transaction->delayed_refs;
>>         lockdep_assert_held(&delayed_refs->lock);
>> @@ -833,6 +844,12 @@ add_delayed_ref_head(struct btrfs_trans_handle *trans,
>>
>>         /* Record qgroup extent info if provided */
>>         if (qrecord) {
>> +               /*
>> +                * Setting 'qrecord' but not 'qrecord_inserted_ret' will likely
>> +                * result in a memory leakage.
>> +                */
>> +               WARN_ON(!qrecord_inserted_ret);
>
> For this sort of mandatory stuff, we use assertions, not warnings:
>
> ASSERT(qrecord_insert_ret != NULL);
>
>
>> +
>>                 int ret;
>>
>>                 ret = btrfs_qgroup_trace_extent_nolock(fs_info, delayed_refs, qrecord,
>> @@ -840,12 +857,10 @@ add_delayed_ref_head(struct btrfs_trans_handle *trans,
>>                 if (ret) {
>>                         /* Clean up if insertion fails or item exists. */
>>                         xa_release(&delayed_refs->dirty_extents, index);
>> -                       /* Caller responsible for freeing qrecord on error. */
>>                         if (ret < 0)
>>                                 return ERR_PTR(ret);
>> -                       kfree(qrecord);
>> -               } else {
>> -                       qrecord_inserted = true;
>> +               } else if (qrecord_inserted_ret) {
>> +                       *qrecord_inserted_ret = true;
>>                 }
>>         }
>>
>> @@ -888,8 +903,6 @@ add_delayed_ref_head(struct btrfs_trans_handle *trans,
>>                 delayed_refs->num_heads++;
>>                 delayed_refs->num_heads_ready++;
>>         }
>> -       if (qrecord_inserted_ret)
>> -               *qrecord_inserted_ret = qrecord_inserted;
>>
>>         return head_ref;
>>  }
>> @@ -1049,6 +1062,14 @@ static int add_delayed_ref(struct btrfs_trans_handle *trans,
>>                 xa_release(&delayed_refs->head_refs, index);
>>                 spin_unlock(&delayed_refs->lock);
>>                 ret = PTR_ERR(new_head_ref);
>> +
>> +               /*
>> +                * It's only safe to call kfree() on 'qrecord' if
>> +                * 'add_delayed_ref_head' has _not_ inserted it for
>
> The notation we use for function names is  function_name(), not 'function_name'.

I've included all your comments into a v2 patch set. I'll wait a bit
before sending it in case I get more comments.

>
> Otherwise it looks good, thanks.
>

Thanks to you for the review!

>> +                * tracing. Otherwise we need to handle this here.
>> +                */
>> +               if (!qrecord_reserved || qrecord_inserted)
>> +                       goto free_head_ref;
>>                 goto free_record;
>>         }
>>         head_ref = new_head_ref;
>> @@ -1071,6 +1092,8 @@ static int add_delayed_ref(struct btrfs_trans_handle *trans,
>>
>>         if (qrecord_inserted)
>>                 return btrfs_qgroup_trace_extent_post(trans, record, generic_ref->bytenr);
>> +
>> +       kfree(record);
>>         return 0;
>>
>>  free_record:
>> --
>> 2.51.0
>>
>>

Download attachment "signature.asc" of type "application/pgp-signature" (898 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ