linux-kernel - Re: [PATCH] btrfs: fix data race when accessing the block

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPjX3FeaL2+oRz81OEkLKjWwr1XuOOa3t-kgCrc51we-C-GVng@mail.gmail.com>
Date: Tue, 18 Feb 2025 09:08:23 +0100
From: Daniel Vacek <neelx@...e.com>
To: Filipe Manana <fdmanana@...nel.org>
Cc: Hao-ran Zheng <zhenghaoran154@...il.com>, clm@...com, josef@...icpanda.com, 
	dsterba@...e.com, linux-btrfs@...r.kernel.org, linux-kernel@...r.kernel.org, 
	baijiaju1990@...il.com, 21371365@...a.edu.cn
Subject: Re: [PATCH] btrfs: fix data race when accessing the block_group's
 used field

On Mon, 10 Feb 2025 at 12:11, Filipe Manana <fdmanana@...nel.org> wrote:
>
> On Sat, Feb 8, 2025 at 7:38 AM Hao-ran Zheng <zhenghaoran154@...il.com> wrote:
> >
> > A data race may occur when the function `btrfs_discard_queue_work()`
> > and the function `btrfs_update_block_group()` is executed concurrently.
> > Specifically, when the `btrfs_update_block_group()` function is executed
> > to lines `cache->used = old_val;`, and `btrfs_discard_queue_work()`
> > is accessing `if(block_group->used == 0)` will appear with data race,
> > which may cause block_group to be placed unexpectedly in discard_list or
> > discard_unused_list. The specific function call stack is as follows:
> >
> > ============DATA_RACE============
> >  btrfs_discard_queue_work+0x245/0x500 [btrfs]
> >  __btrfs_add_free_space+0x3066/0x32f0 [btrfs]
> >  btrfs_add_free_space+0x19a/0x200 [btrfs]
> >  unpin_extent_range+0x847/0x2120 [btrfs]
> >  btrfs_finish_extent_commit+0x9a3/0x1840 [btrfs]
> >  btrfs_commit_transaction+0x5f65/0xc0f0 [btrfs]
> >  transaction_kthread+0x764/0xc20 [btrfs]
> >  kthread+0x292/0x330
> >  ret_from_fork+0x4d/0x80
> >  ret_from_fork_asm+0x1a/0x30
> > ============OTHER_INFO============
> >  btrfs_update_block_group+0xa9d/0x2430 [btrfs]
> >  __btrfs_free_extent+0x4f69/0x9920 [btrfs]
> >  __btrfs_run_delayed_refs+0x290e/0xd7d0 [btrfs]
> >  btrfs_run_delayed_refs+0x317/0x770 [btrfs]
> >  flush_space+0x388/0x1440 [btrfs]
> >  btrfs_preempt_reclaim_metadata_space+0xd65/0x14c0 [btrfs]
> >  process_scheduled_works+0x716/0xf10
> >  worker_thread+0xb6a/0x1190
> >  kthread+0x292/0x330
> >  ret_from_fork+0x4d/0x80
> >  ret_from_fork_asm+0x1a/0x30
> > =================================
> >
> > Although the `block_group->used` was checked again in the use of the
> > `peek_discard_list` function, considering that `block_group->used` is
> > a 64-bit variable, we still think that the data race here is an
> > unexpected behavior. It is recommended to add `READ_ONCE` and
> > `WRITE_ONCE` to read and write.
> >
> > Signed-off-by: Hao-ran Zheng <zhenghaoran154@...il.com>
> > ---
> >  fs/btrfs/block-group.c | 4 ++--
> >  fs/btrfs/discard.c     | 2 +-
> >  2 files changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> > index c0a8f7d92acc..c681b97f6835 100644
> > --- a/fs/btrfs/block-group.c
> > +++ b/fs/btrfs/block-group.c
> > @@ -3678,7 +3678,7 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans,
> >         old_val = cache->used;
> >         if (alloc) {
> >                 old_val += num_bytes;
> > -               cache->used = old_val;
> > +               WRITE_ONCE(cache->used, old_val);
> >                 cache->reserved -= num_bytes;
> >                 cache->reclaim_mark = 0;
> >                 space_info->bytes_reserved -= num_bytes;
> > @@ -3690,7 +3690,7 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans,
> >                 spin_unlock(&space_info->lock);
> >         } else {
> >                 old_val -= num_bytes;
> > -               cache->used = old_val;
> > +               WRITE_ONCE(cache->used, old_val);
> >                 cache->pinned += num_bytes;
> >                 btrfs_space_info_update_bytes_pinned(space_info, num_bytes);
> >                 space_info->bytes_used -= num_bytes;
> > diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c
> > index e815d165cccc..71c57b571d50 100644
> > --- a/fs/btrfs/discard.c
> > +++ b/fs/btrfs/discard.c
> > @@ -363,7 +363,7 @@ void btrfs_discard_queue_work(struct btrfs_discard_ctl *discard_ctl,
> >         if (!block_group || !btrfs_test_opt(block_group->fs_info, DISCARD_ASYNC))
> >                 return;
> >
> > -       if (block_group->used == 0)
> > +       if (READ_ONCE(block_group->used) == 0)
>
> There are at least 3 more places in discard.c where we access ->used
> without being under the protection of the block group's spinlock.
> So let's fix this for all places and not just a single one...
>
> Also, this is quite ugly to spread READ_ONCE/WRITE_ONCE all over the place.
> What we typically do in btrfs is to add helpers that hide them, see
> block-rsv.h for example.
>
> Also, I don't think we need READ_ONCE/WRITE_ONCE.
> We could use data_race(), though I think that could be subject to
> load/store tearing, or just take the lock.
> So adding a helper like this to block-group.h:
>
> static inline u64 btrfs_block_group_used(struct btrfs_block_group *bg)
> {
>    u64 ret;
>
>    spin_lock(&bg->lock);
>    ret = bg->used;
>    spin_unlock(&bg->lock);
>
>     return ret;
> }

Would memory barriers be sufficient here? Taking a lock just for
reading one member seems excessive...

> And then use btrfs_bock_group_used() everywhere in discard.c where we
> aren't holding a block group's lock.
>
> Thanks.
>
>
> >                 add_to_discard_unused_list(discard_ctl, block_group);
> >         else
> >                 add_to_discard_list(discard_ctl, block_group);
> > --
> > 2.34.1
> >
> >
>