[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230706224314.u5zbeld23uqur2ct@moria.home.lan>
Date: Thu, 6 Jul 2023 18:43:14 -0400
From: Kent Overstreet <kent.overstreet@...ux.dev>
To: "Darrick J. Wong" <djwong@...nel.org>
Cc: Josef Bacik <josef@...icpanda.com>, torvalds@...ux-foundation.org,
linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
linux-bcachefs@...r.kernel.org, dchinner@...hat.com,
sandeen@...hat.com, willy@...radead.org, tytso@....edu,
bfoster@...hat.com, jack@...e.cz, andreas.gruenbacher@...il.com,
brauner@...nel.org, peterz@...radead.org,
akpm@...ux-foundation.org, dhowells@...hat.com
Subject: Re: [GIT PULL] bcachefs
On Thu, Jul 06, 2023 at 02:19:14PM -0700, Darrick J. Wong wrote:
> TBH, so long as bcachefs is the only user of sixlocks and mean/variance,
> I don't really care what's in them, though they probably ought to live
> in fs/bcachefs/ until a second user arises.
I've been waiting for Linus to weigh in on those (and the rest of the
merge) since he had opinions a few weeks ago, but I have no real
objection there. I'd need to add an export for osq_lock, that's all.
> > - d_mark_tmpfile(): trivial new helper, from pulling out part of
> > d_tmpfile(). We need this because bcachefs handles the nlink count
> > for tmpfiles earlier, in the btree transaction.
>
> XFS might want this too, we also handle the nlink count for tmpfiles
> earlier, in a transaction, and end up playing stupid games with the
> refcount to fit the vfs function:
>
> if (tmpfile) {
> /*
> * The VFS requires that any inode fed to d_tmpfile must
> * have nlink == 1 so that it can decrement the nlink in
> * d_tmpfile. However, we created the temp file with
> * nlink == 0 because we're not allowed to put an inode
> * with nlink > 0 on the unlinked list. Therefore we
> * have to set nlink to 1 so that d_tmpfile can
> * immediately set it back to zero.
> */
> set_nlink(inode, 1);
> d_tmpfile(tmpfile, inode);
> }
Yeah, that same game would technically work for bcachefs - but I'm
hoping we can just do the right thing here :)
> > - Don't block on s_umount from __invalidate_super: this is a bugfix
> > for a deadlock in generic/042 because of how we use sget(), the
> > commit message goes into more detail.
>
> If this is in reference to the earlier subthread about some io_uring
> thing causing unmount to hang -- my impressions of that were that yes,
> it's a bug, but no, it's not a bug in bcachefs itself. I also wondered
> why (a) that hadn't split out as its own thread; and (b) is this really
> a bcachefs blocker?
No, this is completely unrelated. The io_uring thing hits on
generic/388 (and others) and just causes umount to fail with -EBUSY.
This one is an actual deadlock and it hits every time in generic/042.
It's specific to the loopback device and when it emits certain events,
and it hits every time so I really do need this fix included.
> /me shrugs, been on vacation and in hospitals for the last month or so.
>
> > bcachefs doesn't use sget() for mutual exclusion because a) sget()
> > is insane, what we really want is the _block device_ to be opened
> > exclusively (which we do), and we have our own block device opening
> > path - which we need to, as we're a multi device filesystem.
>
> ...and isn't jan kara already messing with this anyway?
The blkdev_get_handle() patchset? I like that, but I don't think that's
related - if there's something more related to sget() I haven't seen it
yet
> OTOH there's so many layers of tiny helper functions and macros that I
> have a really hard time making sense of all those pre-bcachefs changes.
> That's why I haven't weighed in. Given all the weird problems we've had
> recently with new code colliding badly with under-documented optimized
> core code, I'm fearful of touching anything.
??? not sure what you're referring to here, are there specific patches
or recent issues you're thinking of?
I don't think any of the non-fs/bcachefs/ patches are remotely tricky
enough for any of this to be a concern.
> > You, the btrfs developers, got started when Linux filesystem teams were
> > quite a bit bigger than they are now: I was at Google when Google had a
> > bunch of people working on ext4, and that was when ZFS had recently come
> > out and there was recognition that Linux needed an answer to ZFS and you
> > were able to ride that excitement. It's been a bit harder for me to get
> > something equally ambitions going, to be honest.
> >
> > But years ago when I realized I was onto something, I decided this
> > project was only going to fail if I let it fail - so I'm in it for the
> > long haul.
> >
> > Right now what I'm hearing, in particular from Redhat, is that they want
> > it upstream in order to commit more resources. Which, I know, is not
> > what kernel people want to hear, but it's the chicken-and-the-egg
> > situation I'm in.
>
> /me suspects mainline merging is necessary but not sufficient -- few
> non-developers want to deal with merging an out of tree filesystem, but
> that still doesn't mean anyone will commit real engineering resources.
Yeah, no doubt it will continue to be an uphill battle. But it's a
necessary step in the right direction, for sure.
> > > I am really, really wanting you to succeed here Kent. If the general consensus
> > > is you need to have some idiot review fs/bcachefs I will happily carve out some
> > > time and dig in.
> >
> > That would be much appreciated - I'll owe you some beers next time I see
> > you. But before jumping in, let's see if we can get people who have
> > already worked with the code to say something.
> >
> > Something I've done in the past that might be helpful - instead (or in
> > addition to) having people read code in isolation, perhaps we could do a
> > group call/meeting where people can ask questions about the code, bring
> > up design issues they've seen in other filesystems, etc. - I've also
> > found that kind of setup great for identifying places in the code where
> > additional documentation would be useful.
>
> "At this point I think Kent's QA efforts are at least as good as XFS's,
> just merge it and let's move on to the next thing."
high praise :)
Powered by blists - more mailing lists