[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230810175205.gtlkydeis37xdxuk@quack3>
Date: Thu, 10 Aug 2023 19:52:05 +0200
From: Jan Kara <jack@...e.cz>
To: Kent Overstreet <kent.overstreet@...ux.dev>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
linux-bcachefs@...r.kernel.org, djwong@...nel.org,
dchinner@...hat.com, sandeen@...hat.com, willy@...radead.org,
josef@...icpanda.com, tytso@....edu, bfoster@...hat.com,
jack@...e.cz, andreas.gruenbacher@...il.com, brauner@...nel.org,
peterz@...radead.org, akpm@...ux-foundation.org,
dhowells@...hat.com, snitzer@...nel.org, axboe@...nel.dk
Subject: Re: [GIT PULL] bcachefs
On Thu 10-08-23 11:54:53, Kent Overstreet wrote:
> > And there clearly is something very strange going on with superblock
> > handling
>
> This deserves an explanation because sget() is a bit nutty.
>
> The way sget() is conventionally used for block device filesystems, the
> block device open _isn't actually exclusive_ - sure, FMODE_EXCL is used,
> but the holder is the fs type pointer, so it won't exclude with other
> opens of the same fs type.
>
> That means the only protection from multiple opens scribbling over each
> other is sget() itself - but if the bdev handle ever outlives the
> superblock we're completely screwed; that's a silent data corruption bug
> that we can't easily catch, and if the filesystem teardown path has any
> asynchronous stuff going on (and of course it does) that's not a hard
> mistake to make. I've observed at least one bug that looked suspiciously
> like that, but I don't think I quite pinned it down at the time.
This is just being changed - check Christian's VFS tree. There are patches
that make sget() use superblock pointer as a bdev holder so the reuse
you're speaking about isn't a problem anymore.
> It also forces the caller to separate opening of the block devices from
> the rest of filesystem initialization, which is a bit less than ideal.
>
> Anyways, bcachefs just wants to be able to do real exclusive opens of
> the block devices, and we do all filesystem bringup with a single
> bch2_fs_open() call. I think this could be made to work with the way
> sget() wants to work, but it'd require reworking the locking in
> sget() - it does everything, including the test() and set() calls, under
> a single spinlock.
Yeah. Maybe the current upstream changes aren't enough to make your life
easier for bcachefs, btrfs does its special thing as well after all because
mount also involves multiple devices for it. I just wanted to mention that
the exclusive bdev open thing is changing.
Honza
--
Jan Kara <jack@...e.com>
SUSE Labs, CR
Powered by blists - more mailing lists