linux-kernel - Re: mnt_list corruption triggered during btrfs/326

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <6f5f97bc-6333-4d07-9684-1f9bab9bd571@suse.com>
Date: Sun, 5 Jan 2025 08:56:02 +1030
From: Qu Wenruo <wqu@...e.com>
To: Christian Brauner <brauner@...nel.org>, Qu Wenruo <quwenruo.btrfs@....com>
Cc: linux-fsdevel@...r.kernel.org, linux-btrfs <linux-btrfs@...r.kernel.org>,
 LKML <linux-kernel@...r.kernel.org>
Subject: Re: mnt_list corruption triggered during btrfs/326



在 2025/1/4 21:56, Christian Brauner 写道:
> On Wed, Jan 01, 2025 at 07:05:10AM +1030, Qu Wenruo wrote:
>>
>>
>> 在 2024/12/30 19:59, Qu Wenruo 写道:
>>> Hi,
>>>
>>> Although I know it's triggered from btrfs, but the mnt_list handling is
>>> out of btrfs' control, so I'm here asking for some help.
> 
> Thanks for the report.
> 
>>>
>>> [BUG]
>>> With CONFIG_DEBUG_LIST and CONFIG_BUG_ON_DATA_CORRUPTION, and an
>>> upstream 6.13-rc kernel, which has commit 951a3f59d268 ("btrfs: fix
>>> mount failure due to remount races"), I can hit the following crash,
>>> with varied frequency (from 1/4 to hundreds runs no crash):
>>
>> There is also another WARNING triggered, without btrfs callback involved
>> at all:
>>
>> [  192.688671] ------------[ cut here ]------------
>> [  192.690016] WARNING: CPU: 3 PID: 59747 at fs/mount.h:150
> 
> This would indicate that move_from_ns() was called on a mount that isn't
> attached to a mount namespace (anymore or never has).
> 
> Here's it's particularly peculiar because it looks like the warning is
> caused by calling move_from_ns() when moving a mount from an anonymous
> mount namespace in attach_recursive_mnt().
> 
> Can you please try and reproduce this with
> commit 211364bef4301838b2e1 ("fs: kill MNT_ONRB")
> from the vfs-6.14.mount branch in
> https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git ?
> 

After the initial 1000 runs (with 951a3f59d268 ("btrfs: fix mount 
failure due to remount races") cherry picked, or it won't pass that test 
case), there is no crash nor warning so far.

It's already the best run so far, but I'll keep it running for another 
day or so just to be extra safe.

So I guess the offending commit is 2eea9ce4310d ("mounts: keep list of 
mounts in an rbtree")?
Putting a list and rb_tree into a union indeed seems a little dangerous, 
sorry I didn't notice that earlier, but my vmcore indeed show a 
seemingly valid mnt_node (color = 1, both left/right are NULL).

Thanks a lot for the fix, and it's really a huge relief that it's not 
something inside btrfs causing the bug.

Thanks,
Qu

[...]
>>>
>>> The only caller doesn't hold @mount_lock is iterate_mounts() but that's
>>> only called from audit, and I'm not sure if audit is even involved in
>>> this case.
> 
> This is fine as audit creates a private copy of the mount tree it is
> interested in. The mount tree is not visible to other callers anymore.
> 
>>>
>>> So I ran out of ideas why this mnt_list can even happen.
>>>
>>> Even if it's some btrfs' abuse, all mnt_list users are properly
>>> protected thus it should not lead to such list corruption.
>>>
>>> Any advice would be appreciated.
>>>
>>> Thanks,
>>> Qu
>>>
>>
>