linux-kernel - Re: mnt_list corruption triggered during btrfs/326

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1d0788af-bbac-44e6-8954-af7810fbb101@suse.com>
Date: Tue, 7 Jan 2025 08:30:05 +1030
From: Qu Wenruo <wqu@...e.com>
To: Daniel Vacek <neelx@...e.com>
Cc: Christian Brauner <brauner@...nel.org>, Qu Wenruo
 <quwenruo.btrfs@....com>, linux-fsdevel@...r.kernel.org,
 linux-btrfs <linux-btrfs@...r.kernel.org>,
 LKML <linux-kernel@...r.kernel.org>
Subject: Re: mnt_list corruption triggered during btrfs/326



在 2025/1/7 08:20, Daniel Vacek 写道:
> On Sat, 4 Jan 2025 at 23:26, Qu Wenruo <wqu@...e.com> wrote:
>>
>>
>>
>> 在 2025/1/4 21:56, Christian Brauner 写道:
>>> On Wed, Jan 01, 2025 at 07:05:10AM +1030, Qu Wenruo wrote:
>>>>
>>>>
>>>> 在 2024/12/30 19:59, Qu Wenruo 写道:
>>>>> Hi,
>>>>>
>>>>> Although I know it's triggered from btrfs, but the mnt_list handling is
>>>>> out of btrfs' control, so I'm here asking for some help.
>>>
>>> Thanks for the report.
>>>
>>>>>
>>>>> [BUG]
>>>>> With CONFIG_DEBUG_LIST and CONFIG_BUG_ON_DATA_CORRUPTION, and an
>>>>> upstream 6.13-rc kernel, which has commit 951a3f59d268 ("btrfs: fix
>>>>> mount failure due to remount races"), I can hit the following crash,
>>>>> with varied frequency (from 1/4 to hundreds runs no crash):
>>>>
>>>> There is also another WARNING triggered, without btrfs callback involved
>>>> at all:
>>>>
>>>> [  192.688671] ------------[ cut here ]------------
>>>> [  192.690016] WARNING: CPU: 3 PID: 59747 at fs/mount.h:150
>>>
>>> This would indicate that move_from_ns() was called on a mount that isn't
>>> attached to a mount namespace (anymore or never has).
>>>
>>> Here's it's particularly peculiar because it looks like the warning is
>>> caused by calling move_from_ns() when moving a mount from an anonymous
>>> mount namespace in attach_recursive_mnt().
>>>
>>> Can you please try and reproduce this with
>>> commit 211364bef4301838b2e1 ("fs: kill MNT_ONRB")
>>> from the vfs-6.14.mount branch in
>>> https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git ?
>>>
>>
>> After the initial 1000 runs (with 951a3f59d268 ("btrfs: fix mount
>> failure due to remount races") cherry picked, or it won't pass that test
>> case), there is no crash nor warning so far.
>>
>> It's already the best run so far, but I'll keep it running for another
>> day or so just to be extra safe.
>>
>> So I guess the offending commit is 2eea9ce4310d ("mounts: keep list of
>> mounts in an rbtree")?
> 
> This one was merged in v6.8 - why would it cause crashes only now?

Because in v6.8 btrfs also migrated to the new mount API, which caused 
the ro/rw mount race which can fail the mount.

That's exactly why the test case is introduced.

Before the recent ro/rw mount fix, the test case won't go that far but 
error out early so we don't have enough loops to trigger the bug.

> 
>> Putting a list and rb_tree into a union indeed seems a little dangerous,
>> sorry I didn't notice that earlier, but my vmcore indeed show a
>> seemingly valid mnt_node (color = 1, both left/right are NULL).
> 
> The union seems fine to me as long as the `MNT_ONRB` bit stays
> consistent. The crashes (nor warnings) are simply caused by the flag
> missing where it should have been set.

That also means the mnt_flag needs to be properly protected, at least 
with the same level of mnt_list/mnt_node.

But a lot of time such flag is atomically accessed using 
test/set/clear_bit(), without the same level of lock protection.
So my current uneducated guess is, there is a race window where the flag 
and member got de-synced.

Thus it's not as safe as a non-unioned member.

Thanks,
Qu

> 
> --nX
> 
>> Thanks a lot for the fix, and it's really a huge relief that it's not
>> something inside btrfs causing the bug.
>>
>> Thanks,
>> Qu
>>
>> [...]
>>>>>
>>>>> The only caller doesn't hold @mount_lock is iterate_mounts() but that's
>>>>> only called from audit, and I'm not sure if audit is even involved in
>>>>> this case.
>>>
>>> This is fine as audit creates a private copy of the mount tree it is
>>> interested in. The mount tree is not visible to other callers anymore.
>>>
>>>>>
>>>>> So I ran out of ideas why this mnt_list can even happen.
>>>>>
>>>>> Even if it's some btrfs' abuse, all mnt_list users are properly
>>>>> protected thus it should not lead to such list corruption.
>>>>>
>>>>> Any advice would be appreciated.
>>>>>
>>>>> Thanks,
>>>>> Qu
>>>>>
>>>>
>>>
>>
>>