linux-kernel - Re: [PATCH v2] btrfs: fix rw device counting in __btrfs_free_extra

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20210819173403.GI5047@twin.jikos.cz>
Date:   Thu, 19 Aug 2021 19:34:03 +0200
From:   David Sterba <dsterba@...e.cz>
To:     Desmond Cheong Zhi Xi <desmondcheongzx@...il.com>
Cc:     dsterba@...e.cz, clm@...com, josef@...icpanda.com,
        dsterba@...e.com, anand.jain@...cle.com,
        linux-btrfs@...r.kernel.org, linux-kernel@...r.kernel.org,
        skhan@...uxfoundation.org, gregkh@...uxfoundation.org,
        linux-kernel-mentees@...ts.linuxfoundation.org,
        syzbot+a70e2ad0879f160b9217@...kaller.appspotmail.com
Subject: Re: [PATCH v2] btrfs: fix rw device counting in
 __btrfs_free_extra_devids

On Fri, Aug 20, 2021 at 01:11:58AM +0800, Desmond Cheong Zhi Xi wrote:
> >>> The option #2 does not sound safe because the TGT bit is checked in
> >>> several places where device list is queried for various reasons, even
> >>> without a mounted filesystem.
> >>>
> >>> Removing the assertion makes more sense but I'm still not convinced that
> >>> the this is expected/allowed state of a closed device.
> >>>
> >>
> >> Would it be better if we cleared the REPLACE_TGT bit only when closing
> >> the device where device->devid == BTRFS_DEV_REPLACE_DEVID?
> >>
> >> The first conditional in btrfs_close_one_device assumes that we can come
> >> across such a device. If we come across it, we should properly reset it.
> >>
> >> If other devices has this bit set, the ASSERT will still catch it and
> >> let us know something is wrong.
> > 
> > That sounds great.
> > 
> >> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> >> index 70f94b75f25a..a5afebb78ecf 100644
> >> --- a/fs/btrfs/volumes.c
> >> +++ b/fs/btrfs/volumes.c
> >> @@ -1130,6 +1130,9 @@ static void btrfs_close_one_device(struct btrfs_device *device)
> >>                   fs_devices->rw_devices--;
> >>           }
> >>    
> >> +       if (device->devid == BTRFS_DEV_REPLACE_DEVID)
> >> +               clear_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state);
> >> +
> >>           if (test_bit(BTRFS_DEV_STATE_MISSING, &device->dev_state))
> >>                   fs_devices->missing_devices--;
> > 
> > I'll do a few test rounds, thanks.
> 
> Just following up. Did that resolve the issue or is further 
> investigation needed?

The fix seems to work, I haven't seen the assertion fail anymore,
incidentally the crash also stopped to show up on an unpatched branch.