linux-kernel - Re: [PATCH] btrfs: fix data race when accessing the block

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAL3q7H5JgQkFavwrjOsvxDt9mMjVUK_nPOha-WYU-muLW=Orug@mail.gmail.com>
Date: Mon, 10 Feb 2025 11:09:43 +0000
From: Filipe Manana <fdmanana@...nel.org>
To: Hao-ran Zheng <zhenghaoran154@...il.com>
Cc: clm@...com, josef@...icpanda.com, dsterba@...e.com, 
	linux-btrfs@...r.kernel.org, linux-kernel@...r.kernel.org, 
	baijiaju1990@...il.com, 21371365@...a.edu.cn
Subject: Re: [PATCH] btrfs: fix data race when accessing the block_group's
 used field

On Sat, Feb 8, 2025 at 7:38 AM Hao-ran Zheng <zhenghaoran154@...il.com> wrote:
>
> A data race may occur when the function `btrfs_discard_queue_work()`
> and the function `btrfs_update_block_group()` is executed concurrently.
> Specifically, when the `btrfs_update_block_group()` function is executed
> to lines `cache->used = old_val;`, and `btrfs_discard_queue_work()`
> is accessing `if(block_group->used == 0)` will appear with data race,
> which may cause block_group to be placed unexpectedly in discard_list or
> discard_unused_list. The specific function call stack is as follows:
>
> ============DATA_RACE============
>  btrfs_discard_queue_work+0x245/0x500 [btrfs]
>  __btrfs_add_free_space+0x3066/0x32f0 [btrfs]
>  btrfs_add_free_space+0x19a/0x200 [btrfs]
>  unpin_extent_range+0x847/0x2120 [btrfs]
>  btrfs_finish_extent_commit+0x9a3/0x1840 [btrfs]
>  btrfs_commit_transaction+0x5f65/0xc0f0 [btrfs]
>  transaction_kthread+0x764/0xc20 [btrfs]
>  kthread+0x292/0x330
>  ret_from_fork+0x4d/0x80
>  ret_from_fork_asm+0x1a/0x30
> ============OTHER_INFO============
>  btrfs_update_block_group+0xa9d/0x2430 [btrfs]
>  __btrfs_free_extent+0x4f69/0x9920 [btrfs]
>  __btrfs_run_delayed_refs+0x290e/0xd7d0 [btrfs]
>  btrfs_run_delayed_refs+0x317/0x770 [btrfs]
>  flush_space+0x388/0x1440 [btrfs]
>  btrfs_preempt_reclaim_metadata_space+0xd65/0x14c0 [btrfs]
>  process_scheduled_works+0x716/0xf10
>  worker_thread+0xb6a/0x1190
>  kthread+0x292/0x330
>  ret_from_fork+0x4d/0x80
>  ret_from_fork_asm+0x1a/0x30
> =================================
>
> Although the `block_group->used` was checked again in the use of the
> `peek_discard_list` function, considering that `block_group->used` is
> a 64-bit variable, we still think that the data race here is an
> unexpected behavior. It is recommended to add `READ_ONCE` and
> `WRITE_ONCE` to read and write.
>
> Signed-off-by: Hao-ran Zheng <zhenghaoran154@...il.com>
> ---
>  fs/btrfs/block-group.c | 4 ++--
>  fs/btrfs/discard.c     | 2 +-
>  2 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> index c0a8f7d92acc..c681b97f6835 100644
> --- a/fs/btrfs/block-group.c
> +++ b/fs/btrfs/block-group.c
> @@ -3678,7 +3678,7 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans,
>         old_val = cache->used;
>         if (alloc) {
>                 old_val += num_bytes;
> -               cache->used = old_val;
> +               WRITE_ONCE(cache->used, old_val);
>                 cache->reserved -= num_bytes;
>                 cache->reclaim_mark = 0;
>                 space_info->bytes_reserved -= num_bytes;
> @@ -3690,7 +3690,7 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans,
>                 spin_unlock(&space_info->lock);
>         } else {
>                 old_val -= num_bytes;
> -               cache->used = old_val;
> +               WRITE_ONCE(cache->used, old_val);
>                 cache->pinned += num_bytes;
>                 btrfs_space_info_update_bytes_pinned(space_info, num_bytes);
>                 space_info->bytes_used -= num_bytes;
> diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c
> index e815d165cccc..71c57b571d50 100644
> --- a/fs/btrfs/discard.c
> +++ b/fs/btrfs/discard.c
> @@ -363,7 +363,7 @@ void btrfs_discard_queue_work(struct btrfs_discard_ctl *discard_ctl,
>         if (!block_group || !btrfs_test_opt(block_group->fs_info, DISCARD_ASYNC))
>                 return;
>
> -       if (block_group->used == 0)
> +       if (READ_ONCE(block_group->used) == 0)

There are at least 3 more places in discard.c where we access ->used
without being under the protection of the block group's spinlock.
So let's fix this for all places and not just a single one...

Also, this is quite ugly to spread READ_ONCE/WRITE_ONCE all over the place.
What we typically do in btrfs is to add helpers that hide them, see
block-rsv.h for example.

Also, I don't think we need READ_ONCE/WRITE_ONCE.
We could use data_race(), though I think that could be subject to
load/store tearing, or just take the lock.
So adding a helper like this to block-group.h:

static inline u64 btrfs_block_group_used(struct btrfs_block_group *bg)
{
   u64 ret;

   spin_lock(&bg->lock);
   ret = bg->used;
   spin_unlock(&bg->lock);

    return ret;
}

And then use btrfs_bock_group_used() everywhere in discard.c where we
aren't holding a block group's lock.

Thanks.


>                 add_to_discard_unused_list(discard_ctl, block_group);
>         else
>                 add_to_discard_list(discard_ctl, block_group);
> --
> 2.34.1
>
>