lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAL3q7H4P9O-6jay6cPYLrrX85y1t52QRQ=_feifY_A0D7p_gLQ@mail.gmail.com>
Date: Sun, 1 Dec 2024 17:38:34 +0000
From: Filipe Manana <fdmanana@...nel.org>
To: Hao-ran Zheng <zhenghaoran154@...il.com>
Cc: clm@...com, josef@...icpanda.com, dsterba@...e.com, 
	linux-btrfs@...r.kernel.org, linux-kernel@...r.kernel.org, 
	baijiaju1990@...il.com, 21371365@...a.edu.cn
Subject: Re: [PATCH] fs: Fix data race in btrfs_drop_extents

On Sun, Dec 1, 2024 at 11:26 AM Hao-ran Zheng <zhenghaoran154@...il.com> wrote:
>
> A data race occurs when the function `insert_ordered_extent_file_extent()`
> and the function `btrfs_inode_safe_disk_i_size_write()` are executed
> concurrently. The function `insert_ordered_extent_file_extent()` is not
> locked when reading inode->disk_i_size, causing
> `btrfs_inode_safe_disk_i_size_write()`to cause data competition when
> writing inode->disk_i_size, thus affecting the value of `modify_tree`,
> leading to some unexpected results such as disk data being overwritten.

How can that cause "disk data being overwritten"?
And the results are not unexpected at all.

The value of modify_tree is irrelevant from a correctness point of view.
It's used for an optimization to avoid taking write locks on the btree
in case we're doing a write at or beyond eof.

If we end up taking a write lock when it's not needed, everything's
fine - we just may unnecessarily block concurrent readers that need to
access the same btree path (leaf and parent node).

If we don't take a write lock and we need it, we will later figure
that out and switch to a write lock.

> The specific call stack that appears during testing is as follows:
>
> ============DATA_RACE============
>  btrfs_drop_extents+0x89a/0xa060 [btrfs]
>  insert_reserved_file_extent+0xb54/0x2960 [btrfs]
>  insert_ordered_extent_file_extent+0xff5/0x1760 [btrfs]
>  btrfs_finish_one_ordered+0x1b85/0x36a0 [btrfs]
>  btrfs_finish_ordered_io+0x37/0x60 [btrfs]
>  finish_ordered_fn+0x3e/0x50 [btrfs]
>  btrfs_work_helper+0x9c9/0x27a0 [btrfs]
>  process_scheduled_works+0x716/0xf10
>  worker_thread+0xb6a/0x1190
>  kthread+0x292/0x330
>  ret_from_fork+0x4d/0x80
>  ret_from_fork_asm+0x1a/0x30
> ============OTHER_INFO============
>  btrfs_inode_safe_disk_i_size_write+0x4ec/0x600 [btrfs]
>  btrfs_finish_one_ordered+0x24c7/0x36a0 [btrfs]
>  btrfs_finish_ordered_io+0x37/0x60 [btrfs]
>  finish_ordered_fn+0x3e/0x50 [btrfs]
>  btrfs_work_helper+0x9c9/0x27a0 [btrfs]
>  process_scheduled_works+0x716/0xf10
>  worker_thread+0xb6a/0x1190
>  kthread+0x292/0x330
>  ret_from_fork+0x4d/0x80
>  ret_from_fork_asm+0x1a/0x30
> =================================
>
> To address this issue, it is recommended to add locks when reading
> inode->disk_i_size and setting the value of modify_tree to prevent
> data inconsistency.

Can also use data_race() here, as it's a harmless race.

Also, please use a proper subject like for example:

btrfs: fix data race when accessing the inode's disk_i_size at
btrfs_drop_extents()

Also please update the changelog with a proper analysis - saying it's
a harmless race and why.

Thanks.

>
> Signed-off-by: Hao-ran Zheng <zhenghaoran154@...il.com>
> ---
>  fs/btrfs/file.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index 4fb521d91b06..189708e6e91a 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -242,8 +242,10 @@ int btrfs_drop_extents(struct btrfs_trans_handle *trans,
>         if (args->drop_cache)
>                 btrfs_drop_extent_map_range(inode, args->start, args->end - 1, false);
>
> +       spin_lock(&inode->lock);
>         if (args->start >= inode->disk_i_size && !args->replace_extent)
>                 modify_tree = 0;
> +       spin_unlock(&inode->lock);
>
>         update_refs = (btrfs_root_id(root) != BTRFS_TREE_LOG_OBJECTID);
>         while (1) {
> --
> 2.34.1
>
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ