[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20240814-dev_replace_rwsem-new-v1-1-c42120994ce6@kernel.org>
Date: Wed, 14 Aug 2024 14:45:40 +0200
From: Johannes Thumshirn <jth@...nel.org>
To: Chris Mason <clm@...com>, Josef Bacik <josef@...icpanda.com>,
David Sterba <dsterba@...e.com>
Cc: linux-btrfs@...r.kernel.org, linux-kernel@...r.kernel.org,
Johannes Thumshin <johannes.thumshirn@....com>,
Filipe Manana <fdmanana@...e.com>,
Johannes Thumshirn <johannes.thumshirn@....com>,
Johannes Thumshirn <jth@...nel.org>
Subject: [PATCH] btrfs: relax dev_replace rwsem usage on scrub with rst
From: Johannes Thumshin <johannes.thumshirn@....com>
Running fstests btrfs/011 with MKFS_OPTIONS="-O rst" to force the usage of
the RAID stripe-tree, we get the following splat from lockdep:
BTRFS info (device sdd): dev_replace from /dev/sdd (devid 1) to /dev/sdb started
============================================
WARNING: possible recursive locking detected
6.11.0-rc3+ #592 Not tainted
--------------------------------------------
btrfs/4203 is trying to acquire lock:
ffff888103f35c98 (&fs_info->dev_replace.rwsem){++++}-{3:3}, at: btrfs_map_block+0x39f/0x2250
but task is already holding lock:
ffff888103f35c98 (&fs_info->dev_replace.rwsem){++++}-{3:3}, at: btrfs_map_block+0x39f/0x2250
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&fs_info->dev_replace.rwsem);
lock(&fs_info->dev_replace.rwsem);
*** DEADLOCK ***
May be due to missing lock nesting notation
1 lock held by btrfs/4203:
#0: ffff888103f35c98 (&fs_info->dev_replace.rwsem){++++}-{3:3}, at: btrfs_map_block+0x39f/0x2250
This fixes a deadlock on RAID stripe-tree where device replace performs a
scrub operation, which in turn calls into btrfs_map_block() to find the
physical location of the block.
Cc: Filipe Manana <fdmanana@...e.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@....com>
---
Signed-off-by: Johannes Thumshirn <jth@...nel.org>
---
fs/btrfs/volumes.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 4b9b647a7e29..e5bd2bee912d 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6459,6 +6459,7 @@ int btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
int dev_replace_is_ongoing = 0;
u16 num_alloc_stripes;
u64 max_len;
+ bool rst;
ASSERT(bioc_ret);
@@ -6475,6 +6476,8 @@ int btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
if (io_geom.mirror_num > num_copies)
return -EINVAL;
+ rst = btrfs_need_stripe_tree_update(fs_info, map->type);
+
map_offset = logical - map->start;
io_geom.raid56_full_stripe_start = (u64)-1;
max_len = btrfs_max_io_len(map, map_offset, &io_geom);
@@ -6597,13 +6600,19 @@ int btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
* For all other non-RAID56 profiles, just copy the target
* stripe into the bioc.
*/
+ if (rst && dev_replace_is_ongoing)
+ up_read(&dev_replace->rwsem);
for (int i = 0; i < io_geom.num_stripes; i++) {
ret = set_io_stripe(fs_info, logical, length,
&bioc->stripes[i], map, &io_geom);
+
if (ret < 0)
break;
io_geom.stripe_index++;
}
+ if (rst && dev_replace_is_ongoing)
+ down_read(&dev_replace->rwsem);
+
}
if (ret) {
---
base-commit: 4ce21d87ae81a86b488e0d326682883485317dcb
change-id: 20240814-dev_replace_rwsem-new-91c160579246
Best regards,
--
Johannes Thumshirn <jth@...nel.org>
Powered by blists - more mailing lists