linux-ext4 - Re: [PATCH v4 1/6] fs: enhance and rename shutdown() callback to remove

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <c6zp6k7ozn7idiyt4shxhwwe2hoprkgdzq66eau5w4jlgbuwta@od2atq4kexoj>
Date: Tue, 8 Jul 2025 21:13:26 -0400
From: Kent Overstreet <kent.overstreet@...ux.dev>
To: Qu Wenruo <quwenruo.btrfs@....com>
Cc: Dave Chinner <david@...morbit.com>, 
	Christian Brauner <brauner@...nel.org>, "Darrick J. Wong" <djwong@...nel.org>, Qu Wenruo <wqu@...e.com>, 
	linux-btrfs@...r.kernel.org, linux-fsdevel@...r.kernel.org, viro@...iv.linux.org.uk, 
	jack@...e.cz, linux-ext4@...r.kernel.org, 
	linux-f2fs-devel@...ts.sourceforge.net, ntfs3@...ts.linux.dev, linux-xfs@...r.kernel.org
Subject: Re: [PATCH v4 1/6] fs: enhance and rename shutdown() callback to
 remove_bdev()

On Wed, Jul 09, 2025 at 10:25:08AM +0930, Qu Wenruo wrote:
> 在 2025/7/9 10:05, Kent Overstreet 写道:
> > Consider that the thing that has a block device open might not even be a
> > filesystem, or at least a VFS filesystem.
> > 
> > It could be a stacking block device driver - md or md - and those
> > absolutely should be implementing .mark_dead() (likely passing it
> > through on up the stack), or something else entirely.
> > 
> > In bcachefs's case, we might not even have created the VFS super_block
> > yet: we don't do that until after recovery finishes, and indeed we can't
> > because creating a super_block and leaving it in !SB_BORN will cause
> > such fun as sync calls to hang for the entire system...
> > 
> 
> Not related to the series, but IIRC if s_flags doesn't have SB_ACTIVE set,
> things like bdev_super_lock() won't choose that superblock, thus won't call
> ->sync() callback through the bdev callbacks.
> 
> And btrfs also follows the same scheme, only setting SB_ACTIVE after
> everything is done (including replaying the log etc), and so far we haven't
> yet hit such sync during mount.

Well, how long can that take? Have a look at iterate_supers(), it's
something to be wary of.

Fixing the fs/super.c locking is something I was looking at, it's doable
but it'd be a giant hassle - for bcachefs it wasn't worth it, bcachefs
has preexisting reasons for wanting to avoid the vfs superblock
dependency.

Anyways - the VFS trying to own .mark_dead() is a layering violation.

        VFS
------------------
        FS
------------------
        BLOCK

By default, the "FS" layer should be considered a black box by both the
block layer and VFS; reaching across that and assuming filesystem
behavior is a good way to paint yourself into a corner.

It's seductive because most filesystems are single device filesystems,
and for that case it makes total sense to provide helpers for
convenience, given that there's not much room for behaviour to deviate
in the single device case.

But in the multi device case: the behaviour is completely up to the
filesystem - in general we don't shut down the entire filesystem if a
single block device goes dead, if we're redundant we can just drop that
device and continue.

And if you're thinking you want to make use of locking provided by the
VFS - I would warn away from that line of thinking too, mixing up
locking from different layers creates all sorts of fun...