[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAL3q7H7iFQ0pS+TB8NNj5CDA=c1cmurSiGsuXDn61O6A5=mBSQ@mail.gmail.com>
Date: Tue, 18 Feb 2025 10:18:29 +0000
From: Filipe Manana <fdmanana@...nel.org>
To: Daniel Vacek <neelx@...e.com>
Cc: Hao-ran Zheng <zhenghaoran154@...il.com>, clm@...com, josef@...icpanda.com,
dsterba@...e.com, linux-btrfs@...r.kernel.org, linux-kernel@...r.kernel.org,
baijiaju1990@...il.com, 21371365@...a.edu.cn
Subject: Re: [PATCH] btrfs: fix data race when accessing the block_group's
used field
On Tue, Feb 18, 2025 at 8:08 AM Daniel Vacek <neelx@...e.com> wrote:
>
> On Mon, 10 Feb 2025 at 12:11, Filipe Manana <fdmanana@...nel.org> wrote:
> >
> > On Sat, Feb 8, 2025 at 7:38 AM Hao-ran Zheng <zhenghaoran154@...il.com> wrote:
> > >
> > > A data race may occur when the function `btrfs_discard_queue_work()`
> > > and the function `btrfs_update_block_group()` is executed concurrently.
> > > Specifically, when the `btrfs_update_block_group()` function is executed
> > > to lines `cache->used = old_val;`, and `btrfs_discard_queue_work()`
> > > is accessing `if(block_group->used == 0)` will appear with data race,
> > > which may cause block_group to be placed unexpectedly in discard_list or
> > > discard_unused_list. The specific function call stack is as follows:
> > >
> > > ============DATA_RACE============
> > > btrfs_discard_queue_work+0x245/0x500 [btrfs]
> > > __btrfs_add_free_space+0x3066/0x32f0 [btrfs]
> > > btrfs_add_free_space+0x19a/0x200 [btrfs]
> > > unpin_extent_range+0x847/0x2120 [btrfs]
> > > btrfs_finish_extent_commit+0x9a3/0x1840 [btrfs]
> > > btrfs_commit_transaction+0x5f65/0xc0f0 [btrfs]
> > > transaction_kthread+0x764/0xc20 [btrfs]
> > > kthread+0x292/0x330
> > > ret_from_fork+0x4d/0x80
> > > ret_from_fork_asm+0x1a/0x30
> > > ============OTHER_INFO============
> > > btrfs_update_block_group+0xa9d/0x2430 [btrfs]
> > > __btrfs_free_extent+0x4f69/0x9920 [btrfs]
> > > __btrfs_run_delayed_refs+0x290e/0xd7d0 [btrfs]
> > > btrfs_run_delayed_refs+0x317/0x770 [btrfs]
> > > flush_space+0x388/0x1440 [btrfs]
> > > btrfs_preempt_reclaim_metadata_space+0xd65/0x14c0 [btrfs]
> > > process_scheduled_works+0x716/0xf10
> > > worker_thread+0xb6a/0x1190
> > > kthread+0x292/0x330
> > > ret_from_fork+0x4d/0x80
> > > ret_from_fork_asm+0x1a/0x30
> > > =================================
> > >
> > > Although the `block_group->used` was checked again in the use of the
> > > `peek_discard_list` function, considering that `block_group->used` is
> > > a 64-bit variable, we still think that the data race here is an
> > > unexpected behavior. It is recommended to add `READ_ONCE` and
> > > `WRITE_ONCE` to read and write.
> > >
> > > Signed-off-by: Hao-ran Zheng <zhenghaoran154@...il.com>
> > > ---
> > > fs/btrfs/block-group.c | 4 ++--
> > > fs/btrfs/discard.c | 2 +-
> > > 2 files changed, 3 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> > > index c0a8f7d92acc..c681b97f6835 100644
> > > --- a/fs/btrfs/block-group.c
> > > +++ b/fs/btrfs/block-group.c
> > > @@ -3678,7 +3678,7 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans,
> > > old_val = cache->used;
> > > if (alloc) {
> > > old_val += num_bytes;
> > > - cache->used = old_val;
> > > + WRITE_ONCE(cache->used, old_val);
> > > cache->reserved -= num_bytes;
> > > cache->reclaim_mark = 0;
> > > space_info->bytes_reserved -= num_bytes;
> > > @@ -3690,7 +3690,7 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans,
> > > spin_unlock(&space_info->lock);
> > > } else {
> > > old_val -= num_bytes;
> > > - cache->used = old_val;
> > > + WRITE_ONCE(cache->used, old_val);
> > > cache->pinned += num_bytes;
> > > btrfs_space_info_update_bytes_pinned(space_info, num_bytes);
> > > space_info->bytes_used -= num_bytes;
> > > diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c
> > > index e815d165cccc..71c57b571d50 100644
> > > --- a/fs/btrfs/discard.c
> > > +++ b/fs/btrfs/discard.c
> > > @@ -363,7 +363,7 @@ void btrfs_discard_queue_work(struct btrfs_discard_ctl *discard_ctl,
> > > if (!block_group || !btrfs_test_opt(block_group->fs_info, DISCARD_ASYNC))
> > > return;
> > >
> > > - if (block_group->used == 0)
> > > + if (READ_ONCE(block_group->used) == 0)
> >
> > There are at least 3 more places in discard.c where we access ->used
> > without being under the protection of the block group's spinlock.
> > So let's fix this for all places and not just a single one...
> >
> > Also, this is quite ugly to spread READ_ONCE/WRITE_ONCE all over the place.
> > What we typically do in btrfs is to add helpers that hide them, see
> > block-rsv.h for example.
> >
> > Also, I don't think we need READ_ONCE/WRITE_ONCE.
> > We could use data_race(), though I think that could be subject to
> > load/store tearing, or just take the lock.
> > So adding a helper like this to block-group.h:
> >
> > static inline u64 btrfs_block_group_used(struct btrfs_block_group *bg)
> > {
> > u64 ret;
> >
> > spin_lock(&bg->lock);
> > ret = bg->used;
> > spin_unlock(&bg->lock);
> >
> > return ret;
> > }
>
> Would memory barriers be sufficient here? Taking a lock just for
> reading one member seems excessive...
Do you think there's heavy contention on this lock?
Usually I prefer to go the simplest way, and using locks is a lot more
straightforward and easier to understand than memory barriers.
Unless it's clear there's an observable performance penalty, keeping
it simple is better.
data_race() may be ok here, at least for one of the unprotected
accesses it's fine, but would have to analyse the other 3 cases.
>
> > And then use btrfs_bock_group_used() everywhere in discard.c where we
> > aren't holding a block group's lock.
> >
> > Thanks.
> >
> >
> > > add_to_discard_unused_list(discard_ctl, block_group);
> > > else
> > > add_to_discard_list(discard_ctl, block_group);
> > > --
> > > 2.34.1
> > >
> > >
> >
Powered by blists - more mailing lists