linux-kernel - Re: [PATCH] btrfs: fix data race when accessing the block

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPjX3Feo9=hkptSxOMREKVckbvhfbmjHAWYBL2sBryDcVzp0NA@mail.gmail.com>
Date: Tue, 18 Feb 2025 17:13:50 +0100
From: Daniel Vacek <neelx@...e.com>
To: Filipe Manana <fdmanana@...nel.org>
Cc: Hao-ran Zheng <zhenghaoran154@...il.com>, clm@...com, josef@...icpanda.com, 
	dsterba@...e.com, linux-btrfs@...r.kernel.org, linux-kernel@...r.kernel.org, 
	baijiaju1990@...il.com, 21371365@...a.edu.cn
Subject: Re: [PATCH] btrfs: fix data race when accessing the block_group's
 used field

On Tue, 18 Feb 2025 at 11:19, Filipe Manana <fdmanana@...nel.org> wrote:
>
> On Tue, Feb 18, 2025 at 8:08 AM Daniel Vacek <neelx@...e.com> wrote:
> >
> > On Mon, 10 Feb 2025 at 12:11, Filipe Manana <fdmanana@...nel.org> wrote:
> > >
> > > On Sat, Feb 8, 2025 at 7:38 AM Hao-ran Zheng <zhenghaoran154@...il.com> wrote:
> > > >
> > > > A data race may occur when the function `btrfs_discard_queue_work()`
> > > > and the function `btrfs_update_block_group()` is executed concurrently.
> > > > Specifically, when the `btrfs_update_block_group()` function is executed
> > > > to lines `cache->used = old_val;`, and `btrfs_discard_queue_work()`
> > > > is accessing `if(block_group->used == 0)` will appear with data race,
> > > > which may cause block_group to be placed unexpectedly in discard_list or
> > > > discard_unused_list. The specific function call stack is as follows:
> > > >
> > > > ============DATA_RACE============
> > > >  btrfs_discard_queue_work+0x245/0x500 [btrfs]
> > > >  __btrfs_add_free_space+0x3066/0x32f0 [btrfs]
> > > >  btrfs_add_free_space+0x19a/0x200 [btrfs]
> > > >  unpin_extent_range+0x847/0x2120 [btrfs]
> > > >  btrfs_finish_extent_commit+0x9a3/0x1840 [btrfs]
> > > >  btrfs_commit_transaction+0x5f65/0xc0f0 [btrfs]
> > > >  transaction_kthread+0x764/0xc20 [btrfs]
> > > >  kthread+0x292/0x330
> > > >  ret_from_fork+0x4d/0x80
> > > >  ret_from_fork_asm+0x1a/0x30
> > > > ============OTHER_INFO============
> > > >  btrfs_update_block_group+0xa9d/0x2430 [btrfs]
> > > >  __btrfs_free_extent+0x4f69/0x9920 [btrfs]
> > > >  __btrfs_run_delayed_refs+0x290e/0xd7d0 [btrfs]
> > > >  btrfs_run_delayed_refs+0x317/0x770 [btrfs]
> > > >  flush_space+0x388/0x1440 [btrfs]
> > > >  btrfs_preempt_reclaim_metadata_space+0xd65/0x14c0 [btrfs]
> > > >  process_scheduled_works+0x716/0xf10
> > > >  worker_thread+0xb6a/0x1190
> > > >  kthread+0x292/0x330
> > > >  ret_from_fork+0x4d/0x80
> > > >  ret_from_fork_asm+0x1a/0x30
> > > > =================================
> > > >
> > > > Although the `block_group->used` was checked again in the use of the
> > > > `peek_discard_list` function, considering that `block_group->used` is
> > > > a 64-bit variable, we still think that the data race here is an
> > > > unexpected behavior. It is recommended to add `READ_ONCE` and
> > > > `WRITE_ONCE` to read and write.
> > > >
> > > > Signed-off-by: Hao-ran Zheng <zhenghaoran154@...il.com>
> > > > ---
> > > >  fs/btrfs/block-group.c | 4 ++--
> > > >  fs/btrfs/discard.c     | 2 +-
> > > >  2 files changed, 3 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> > > > index c0a8f7d92acc..c681b97f6835 100644
> > > > --- a/fs/btrfs/block-group.c
> > > > +++ b/fs/btrfs/block-group.c
> > > > @@ -3678,7 +3678,7 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans,
> > > >         old_val = cache->used;
> > > >         if (alloc) {
> > > >                 old_val += num_bytes;
> > > > -               cache->used = old_val;
> > > > +               WRITE_ONCE(cache->used, old_val);
> > > >                 cache->reserved -= num_bytes;
> > > >                 cache->reclaim_mark = 0;
> > > >                 space_info->bytes_reserved -= num_bytes;
> > > > @@ -3690,7 +3690,7 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans,
> > > >                 spin_unlock(&space_info->lock);
> > > >         } else {
> > > >                 old_val -= num_bytes;
> > > > -               cache->used = old_val;
> > > > +               WRITE_ONCE(cache->used, old_val);
> > > >                 cache->pinned += num_bytes;
> > > >                 btrfs_space_info_update_bytes_pinned(space_info, num_bytes);
> > > >                 space_info->bytes_used -= num_bytes;
> > > > diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c
> > > > index e815d165cccc..71c57b571d50 100644
> > > > --- a/fs/btrfs/discard.c
> > > > +++ b/fs/btrfs/discard.c
> > > > @@ -363,7 +363,7 @@ void btrfs_discard_queue_work(struct btrfs_discard_ctl *discard_ctl,
> > > >         if (!block_group || !btrfs_test_opt(block_group->fs_info, DISCARD_ASYNC))
> > > >                 return;
> > > >
> > > > -       if (block_group->used == 0)
> > > > +       if (READ_ONCE(block_group->used) == 0)
> > >
> > > There are at least 3 more places in discard.c where we access ->used
> > > without being under the protection of the block group's spinlock.
> > > So let's fix this for all places and not just a single one...
> > >
> > > Also, this is quite ugly to spread READ_ONCE/WRITE_ONCE all over the place.
> > > What we typically do in btrfs is to add helpers that hide them, see
> > > block-rsv.h for example.
> > >
> > > Also, I don't think we need READ_ONCE/WRITE_ONCE.
> > > We could use data_race(), though I think that could be subject to
> > > load/store tearing, or just take the lock.
> > > So adding a helper like this to block-group.h:
> > >
> > > static inline u64 btrfs_block_group_used(struct btrfs_block_group *bg)
> > > {
> > >    u64 ret;
> > >
> > >    spin_lock(&bg->lock);
> > >    ret = bg->used;
> > >    spin_unlock(&bg->lock);
> > >
> > >     return ret;
> > > }
> >
> > Would memory barriers be sufficient here? Taking a lock just for
> > reading one member seems excessive...
>
> Do you think there's heavy contention on this lock?

This is not about contention. Spin lock should just never be used this
way. Or any lock actually. The critical section only contains a single
fetch operation which does not justify using a lock.
Hence the only guarantee such lock usage provides are the implicit
memory barriers (from which maybe only one of them is really needed
depending on the context where this helper is going to be used).

Simply put, the lock is degraded into a memory barrier this way. So
why not just use the barriers in the first place if only ordering
guarantees are required? It only makes sense.

> Usually I prefer to go the simplest way, and using locks is a lot more
> straightforward and easier to understand than memory barriers.

How so? Locks provide the same memory barriers implicitly.

> Unless it's clear there's an observable performance penalty, keeping
> it simple is better.

Exactly. Here is where our opinions differ. For me 'simple' means
without useless locking.
I mean especially with locks they should only be used when absolutely
necessary. In a sense of 'use the right tool for the job'. There are
more lightweight tools possible in this context. Locks provide
additional guarantees which are not needed here.

On the other hand I understand that using a lock is a 'better safe
than sorry' approach which should also work. Just keep in mind that
spinlock may sleep on RT.

> data_race() may be ok here, at least for one of the unprotected
> accesses it's fine, but would have to analyse the other 3 cases.

That's exactly my thoughts. Maybe not even the barriers are needed
after all. This needs to be checked on a per-case basis.

> >
> > > And then use btrfs_bock_group_used() everywhere in discard.c where we
> > > aren't holding a block group's lock.
> > >
> > > Thanks.
> > >
> > >
> > > >                 add_to_discard_unused_list(discard_ctl, block_group);
> > > >         else
> > > >                 add_to_discard_list(discard_ctl, block_group);
> > > > --
> > > > 2.34.1
> > > >
> > > >
> > >