linux-kernel - Re: [PATCH] fs: Fix data race in btrfs_drop

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CAL3q7H4P9O-6jay6cPYLrrX85y1t52QRQ=_feifY_A0D7p_gLQ@mail.gmail.com>
Date: Sun, 1 Dec 2024 17:38:34 +0000
From: Filipe Manana <fdmanana@...nel.org>
To: Hao-ran Zheng <zhenghaoran154@...il.com>
Cc: clm@...com, josef@...icpanda.com, dsterba@...e.com, 
	linux-btrfs@...r.kernel.org, linux-kernel@...r.kernel.org, 
	baijiaju1990@...il.com, 21371365@...a.edu.cn
Subject: Re: [PATCH] fs: Fix data race in btrfs_drop_extents

On Sun, Dec 1, 2024 at 11:26 AM Hao-ran Zheng <zhenghaoran154@...il.com> wrote:
>
> A data race occurs when the function `insert_ordered_extent_file_extent()`
> and the function `btrfs_inode_safe_disk_i_size_write()` are executed
> concurrently. The function `insert_ordered_extent_file_extent()` is not
> locked when reading inode->disk_i_size, causing
> `btrfs_inode_safe_disk_i_size_write()`to cause data competition when
> writing inode->disk_i_size, thus affecting the value of `modify_tree`,
> leading to some unexpected results such as disk data being overwritten.

How can that cause "disk data being overwritten"?
And the results are not unexpected at all.

The value of modify_tree is irrelevant from a correctness point of view.
It's used for an optimization to avoid taking write locks on the btree
in case we're doing a write at or beyond eof.

If we end up taking a write lock when it's not needed, everything's
fine - we just may unnecessarily block concurrent readers that need to
access the same btree path (leaf and parent node).

If we don't take a write lock and we need it, we will later figure
that out and switch to a write lock.

> The specific call stack that appears during testing is as follows:
>
> ============DATA_RACE============
>  btrfs_drop_extents+0x89a/0xa060 [btrfs]
>  insert_reserved_file_extent+0xb54/0x2960 [btrfs]
>  insert_ordered_extent_file_extent+0xff5/0x1760 [btrfs]
>  btrfs_finish_one_ordered+0x1b85/0x36a0 [btrfs]
>  btrfs_finish_ordered_io+0x37/0x60 [btrfs]
>  finish_ordered_fn+0x3e/0x50 [btrfs]
>  btrfs_work_helper+0x9c9/0x27a0 [btrfs]
>  process_scheduled_works+0x716/0xf10
>  worker_thread+0xb6a/0x1190
>  kthread+0x292/0x330
>  ret_from_fork+0x4d/0x80
>  ret_from_fork_asm+0x1a/0x30
> ============OTHER_INFO============
>  btrfs_inode_safe_disk_i_size_write+0x4ec/0x600 [btrfs]
>  btrfs_finish_one_ordered+0x24c7/0x36a0 [btrfs]
>  btrfs_finish_ordered_io+0x37/0x60 [btrfs]
>  finish_ordered_fn+0x3e/0x50 [btrfs]
>  btrfs_work_helper+0x9c9/0x27a0 [btrfs]
>  process_scheduled_works+0x716/0xf10
>  worker_thread+0xb6a/0x1190
>  kthread+0x292/0x330
>  ret_from_fork+0x4d/0x80
>  ret_from_fork_asm+0x1a/0x30
> =================================
>
> To address this issue, it is recommended to add locks when reading
> inode->disk_i_size and setting the value of modify_tree to prevent
> data inconsistency.

Can also use data_race() here, as it's a harmless race.

Also, please use a proper subject like for example:

btrfs: fix data race when accessing the inode's disk_i_size at
btrfs_drop_extents()

Also please update the changelog with a proper analysis - saying it's
a harmless race and why.

Thanks.

>
> Signed-off-by: Hao-ran Zheng <zhenghaoran154@...il.com>
> ---
>  fs/btrfs/file.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index 4fb521d91b06..189708e6e91a 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -242,8 +242,10 @@ int btrfs_drop_extents(struct btrfs_trans_handle *trans,
>         if (args->drop_cache)
>                 btrfs_drop_extent_map_range(inode, args->start, args->end - 1, false);
>
> +       spin_lock(&inode->lock);
>         if (args->start >= inode->disk_i_size && !args->replace_extent)
>                 modify_tree = 0;
> +       spin_unlock(&inode->lock);
>
>         update_refs = (btrfs_root_id(root) != BTRFS_TREE_LOG_OBJECTID);
>         while (1) {
> --
> 2.34.1
>
>