linux-kernel - Re: [GIT PULL] bcachefs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230706224314.u5zbeld23uqur2ct@moria.home.lan>
Date:   Thu, 6 Jul 2023 18:43:14 -0400
From:   Kent Overstreet <kent.overstreet@...ux.dev>
To:     "Darrick J. Wong" <djwong@...nel.org>
Cc:     Josef Bacik <josef@...icpanda.com>, torvalds@...ux-foundation.org,
        linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        linux-bcachefs@...r.kernel.org, dchinner@...hat.com,
        sandeen@...hat.com, willy@...radead.org, tytso@....edu,
        bfoster@...hat.com, jack@...e.cz, andreas.gruenbacher@...il.com,
        brauner@...nel.org, peterz@...radead.org,
        akpm@...ux-foundation.org, dhowells@...hat.com
Subject: Re: [GIT PULL] bcachefs

On Thu, Jul 06, 2023 at 02:19:14PM -0700, Darrick J. Wong wrote:
> TBH, so long as bcachefs is the only user of sixlocks and mean/variance,
> I don't really care what's in them, though they probably ought to live
> in fs/bcachefs/ until a second user arises.

I've been waiting for Linus to weigh in on those (and the rest of the
merge) since he had opinions a few weeks ago, but I have no real
objection there. I'd need to add an export for osq_lock, that's all.

> >  - d_mark_tmpfile(): trivial new helper, from pulling out part of
> >    d_tmpfile(). We need this because bcachefs handles the nlink count
> >    for tmpfiles earlier, in the btree transaction.
> 
> XFS might want this too, we also handle the nlink count for tmpfiles
> earlier, in a transaction, and end up playing stupid games with the
> refcount to fit the vfs function:
> 
> 	if (tmpfile) {
> 		/*
> 		 * The VFS requires that any inode fed to d_tmpfile must
> 		 * have nlink == 1 so that it can decrement the nlink in
> 		 * d_tmpfile.  However, we created the temp file with
> 		 * nlink == 0 because we're not allowed to put an inode
> 		 * with nlink > 0 on the unlinked list.  Therefore we
> 		 * have to set nlink to 1 so that d_tmpfile can
> 		 * immediately set it back to zero.
> 		 */
> 		set_nlink(inode, 1);
> 		d_tmpfile(tmpfile, inode);
> 	}

Yeah, that same game would technically work for bcachefs - but I'm
hoping we can just do the right thing here :)

> >    - Don't block on s_umount from __invalidate_super: this is a bugfix
> >      for a deadlock in generic/042 because of how we use sget(), the
> >      commit message goes into more detail.
> 
> If this is in reference to the earlier subthread about some io_uring
> thing causing unmount to hang -- my impressions of that were that yes,
> it's a bug, but no, it's not a bug in bcachefs itself.  I also wondered
> why (a) that hadn't split out as its own thread; and (b) is this really
> a bcachefs blocker?

No, this is completely unrelated. The io_uring thing hits on
generic/388 (and others) and just causes umount to fail with -EBUSY.
This one is an actual deadlock and it hits every time in generic/042.
It's specific to the loopback device and when it emits certain events,
and it hits every time so I really do need this fix included.

> /me shrugs, been on vacation and in hospitals for the last month or so.
> 
> >      bcachefs doesn't use sget() for mutual exclusion because a) sget()
> >      is insane, what we really want is the _block device_ to be opened
> >      exclusively (which we do), and we have our own block device opening
> >      path - which we need to, as we're a multi device filesystem.
> 
> ...and isn't jan kara already messing with this anyway?

The blkdev_get_handle() patchset? I like that, but I don't think that's
related - if there's something more related to sget() I haven't seen it
yet

> OTOH there's so many layers of tiny helper functions and macros that I
> have a really hard time making sense of all those pre-bcachefs changes.
> That's why I haven't weighed in.  Given all the weird problems we've had
> recently with new code colliding badly with under-documented optimized
> core code, I'm fearful of touching anything.

??? not sure what you're referring to here, are there specific patches
or recent issues you're thinking of?

I don't think any of the non-fs/bcachefs/ patches are remotely tricky
enough for any of this to be a concern.

> > You, the btrfs developers, got started when Linux filesystem teams were
> > quite a bit bigger than they are now: I was at Google when Google had a
> > bunch of people working on ext4, and that was when ZFS had recently come
> > out and there was recognition that Linux needed an answer to ZFS and you
> > were able to ride that excitement. It's been a bit harder for me to get
> > something equally ambitions going, to be honest.
> > 
> > But years ago when I realized I was onto something, I decided this
> > project was only going to fail if I let it fail - so I'm in it for the
> > long haul.
> > 
> > Right now what I'm hearing, in particular from Redhat, is that they want
> > it upstream in order to commit more resources. Which, I know, is not
> > what kernel people want to hear, but it's the chicken-and-the-egg
> > situation I'm in.
> 
> /me suspects mainline merging is necessary but not sufficient -- few
> non-developers want to deal with merging an out of tree filesystem, but
> that still doesn't mean anyone will commit real engineering resources.

Yeah, no doubt it will continue to be an uphill battle. But it's a
necessary step in the right direction, for sure.

> > > I am really, really wanting you to succeed here Kent.  If the general consensus
> > > is you need to have some idiot review fs/bcachefs I will happily carve out some
> > > time and dig in.
> > 
> > That would be much appreciated - I'll owe you some beers next time I see
> > you. But before jumping in, let's see if we can get people who have
> > already worked with the code to say something.
> > 
> > Something I've done in the past that might be helpful - instead (or in
> > addition to) having people read code in isolation, perhaps we could do a
> > group call/meeting where people can ask questions about the code, bring
> > up design issues they've seen in other filesystems, etc. - I've also
> > found that kind of setup great for identifying places in the code where
> > additional documentation would be useful.
> 
> "At this point I think Kent's QA efforts are at least as good as XFS's,
> just merge it and let's move on to the next thing."

high praise :)