linux-kernel - Re: [PATCH v2] btrfs: fix rw device counting in __btrfs_free_extra

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a5690ae1-28ba-a933-6473-e9c1e5480f0c@gmail.com>
Date:   Fri, 13 Aug 2021 17:57:26 +0800
From:   Desmond Cheong Zhi Xi <desmondcheongzx@...il.com>
To:     dsterba@...e.cz, clm@...com, josef@...icpanda.com,
        dsterba@...e.com, anand.jain@...cle.com,
        linux-btrfs@...r.kernel.org, linux-kernel@...r.kernel.org,
        skhan@...uxfoundation.org, gregkh@...uxfoundation.org,
        linux-kernel-mentees@...ts.linuxfoundation.org,
        syzbot+a70e2ad0879f160b9217@...kaller.appspotmail.com
Subject: Re: [PATCH v2] btrfs: fix rw device counting in
 __btrfs_free_extra_devids

On 13/8/21 4:51 pm, David Sterba wrote:
> On Fri, Aug 13, 2021 at 01:31:25AM +0800, Desmond Cheong Zhi Xi wrote:
>> On 12/8/21 11:50 pm, David Sterba wrote:
>>> On Thu, Aug 12, 2021 at 11:43:16PM +0800, Desmond Cheong Zhi Xi wrote:
>>>> On 12/8/21 6:38 pm, David Sterba wrote:
>>>>> On Tue, Jul 27, 2021 at 03:13:03PM +0800, Desmond Cheong Zhi Xi wrote:
>>>>>> --- a/fs/btrfs/volumes.c
>>>>>> +++ b/fs/btrfs/volumes.c
>>>>>> @@ -1078,6 +1078,7 @@ static void __btrfs_free_extra_devids(struct btrfs_fs_devices *fs_devices,
>>>>>>     		if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state)) {
>>>>>>     			list_del_init(&device->dev_alloc_list);
>>>>>>     			clear_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state);
>>>>>> +			fs_devices->rw_devices--;
>>>>>>     		}
>>>>>>     		list_del_init(&device->dev_list);
>>>>>>     		fs_devices->num_devices--;
>>>>>
>>>>> I've hit a crash on master branch with stacktrace very similar to one
>>>>> this bug was supposed to fix. It's a failed assertion on device close.
>>>>> This patch was the last one to touch it and it matches some of the
>>>>> keywords, namely the BTRFS_DEV_STATE_REPLACE_TGT bit that used to be in
>>>>> the original patch but was not reinstated in your fix.
>>>>>
>>>>> I'm not sure how reproducible it is, right now I have only one instance
>>>>> and am hunting another strange problem. They could be related.
>>>>>
>>>>> assertion failed: !test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state), in fs/btrfs/volumes.c:1150
>>>>>
>>>>> https://susepaste.org/view/raw/18223056 full log with other stacktraces,
>>>>> possibly relatedg
>>>>>
>>>>
>>>> Looking at the logs, it seems that a dev_replace was started, then
>>>> suspended. But it wasn't canceled or resumed before the fs devices were
>>>> closed.
>>>>
>>>> I'll investigate further, just throwing some observations out there.
>>>
>>> Thanks. I'm testing the patch revert, no crash after first loop, I'll
>>> run a few more to be sure as it's not entirely reliable.
>>>
>>> Sending the revert is option of last resort as we're approaching end of
>>> 5.14 dev cycle and the crash prevents testing (unlike the fuzzer
>>> warning).
>>>
>>
>> I might be missing something, so any thoughts would be appreciated. But
>> I don't think the assertion in btrfs_close_one_device is correct.
>>
>>   From what I see, this crash happens when close_ctree is called while a
>> dev_replace hasn't completed. In close_ctree, we suspend the
>> dev_replace, but keep the replace target around so that we can resume
>> the dev_replace procedure when we mount the root again. This is the call
>> trace:
>>
>>     close_ctree():
>>       btrfs_dev_replace_suspend_for_unmount();
>>       btrfs_close_devices():
>>         btrfs_close_fs_devices():
>>           btrfs_close_one_device():
>>             ASSERT(!test_bit(BTRFS_DEV_STATE_REPLACE_TGT,
>> &device->dev_state));
>>
>> However, since the replace target sticks around, there is a device with
>> BTRFS_DEV_STATE_REPLACE_TGT set, and we fail the assertion in
>> btrfs_close_one_device.
>>
>> Two options I can think of:
>>
>> - We could remove the assertion.
>>
>> - Or we could clear the BTRFS_DEV_STATE_REPLACE_TGT bit in
>> btrfs_dev_replace_suspend_for_unmount. This is fine since the bit is set
>> again in btrfs_init_dev_replace if the dev_replace->replace_state is
>> BTRFS_IOCTL_DEV_REPLACE_STATE_SUSPENDED. But this approach strikes me as
>> a little odd because the device is still the replace target when
>> mounting in the future.
> 
> The option #2 does not sound safe because the TGT bit is checked in
> several places where device list is queried for various reasons, even
> without a mounted filesystem.
> 
> Removing the assertion makes more sense but I'm still not convinced that
> the this is expected/allowed state of a closed device.
> 

Would it be better if we cleared the REPLACE_TGT bit only when closing
the device where device->devid == BTRFS_DEV_REPLACE_DEVID?

The first conditional in btrfs_close_one_device assumes that we can come
across such a device. If we come across it, we should properly reset it.

If other devices has this bit set, the ASSERT will still catch it and
let us know something is wrong.

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 70f94b75f25a..a5afebb78ecf 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1130,6 +1130,9 @@ static void btrfs_close_one_device(struct btrfs_device *device)
                 fs_devices->rw_devices--;
         }
  
+       if (device->devid == BTRFS_DEV_REPLACE_DEVID)
+               clear_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state);
+
         if (test_bit(BTRFS_DEV_STATE_MISSING, &device->dev_state))
                 fs_devices->missing_devices--;