[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3ae87897-9fb8-20c5-66ea-3289e255fe5e@gmx.com>
Date:   Wed, 2 Aug 2023 14:53:14 +0800
From:   Qu Wenruo <quwenruo.btrfs@....com>
To:     Christoph Hellwig <hch@...radead.org>
Cc:     Christoph Hellwig <hch@....de>,
        syzbot <syzbot+ae97a827ae1c3336bbb4@...kaller.appspotmail.com>,
        clm@...com, dsterba@...e.com, johannes.thumshirn@....com,
        josef@...icpanda.com, linux-btrfs@...r.kernel.org,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        syzkaller-bugs@...glegroups.com
Subject: Re: [syzbot] [btrfs?] kernel BUG in prepare_to_merge
On 2023/8/1 23:26, Christoph Hellwig wrote:
> In the meantime I've also reproduced it with just
> "btrfs: fix the btrfs_get_global_root return value", but it took
> a rather long time.
>
> After wading through the code my suspicion is that before this fix
> the ERR_PTR return made that for those cases btrfs_get_root_ref and
> btrfs_get_fs_root_commit_root don't actually do the
> btrfs_lookup_fs_root.  Although that seemed unintentional as far
> as I can tell it might have prevented some additional problems
> with whatever syzcaller is fuzzing here.  Not sure if anyone who
> knows this code has any good idea where to start looking?
>
I'm also looking into the case, the weird part seems to be we're getting
some race between qgroup tree creation and relocation.
More rounds of syzbot testing shows it's not on-disk data corruption,
but runtime corruption lead to the invalid reloc tree key.
Normally if we're relocating tree 8 (quota tree), we should get
fs_info->quota_root, and it should not has ROOT_SHAREABLE flag, thus we
just go COW the involved quota tree block.
But by somehow, if the quota tree is created by btrfs_init_fs_root() it
would has the ROOT_SHAREABLE flag and leads to the incorrect reloc tree
creation.
My current guess is, some race like this:
             Thread A             |           Thread B
---------------------------------+------------------------------
btrfs_quota_enable()             |
|                                | btrfs_get_root_ref()
|                                | |- btrfs_get_global_root()
|                                | |  Returned NULL
|                                | |- btrfs_lookup_fs_root()
|                                | |  Returned NULL
|- btrfs_create_tree()           | |
|  Now quota root item is        | |
|  inserted                      | |- btrfs_read_tree_root()
|                                | |  Got the newly inserted quota root
|                                | |- btrfs_init_fs_root()
|                                | |  Set ROOT_SHAREABLE flag
By this, with a relocation and quota enabling, we create a race that we
can get a quota root with ROOT_SHAREABLE set, and lead to the problem.
Personally speaking, I don't have a particularly good idea on how to fix it.
We may skip any non-subvolume related trees in btrfs_init_fs_root(), but
that doesn't seem correct to me.
Any good ideas on this?
Thanks,
Qu
Powered by blists - more mailing lists
 
