linux-kernel - Re: VFS regression with 9pfs ("Lookup would have caused loop")

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241023151358.GB28800@willie-the-truck>
Date: Wed, 23 Oct 2024 16:14:00 +0100
From: Will Deacon <will@...nel.org>
To: Dominique Martinet <asmadeus@...ewreck.org>
Cc: ericvh@...nel.org, Thorsten Leemhuis <regressions@...mhuis.info>,
	lucho@...kov.net, Christian Brauner <brauner@...nel.org>,
	Alexander Viro <viro@...iv.linux.org.uk>, oss@...debyte.com,
	v9fs@...ts.linux.dev, linux-kernel@...r.kernel.org, oleg@...hat.com,
	keirf@...gle.com, regressions@...ts.linux.dev
Subject: Re: VFS regression with 9pfs ("Lookup would have caused loop")

On Wed, Oct 23, 2024 at 08:00:04AM +0900, Dominique Martinet wrote:
> Will Deacon wrote on Tue, Oct 22, 2024 at 04:01:49PM +0100:
> > > One note though he did sent a patch that seems related and wasn't sent
> > > for merge:
> > > https://lore.kernel.org/all/CAFkjPTn7JAbmKYASaeBNVpumOncPaReiPbc4Ph6ik3nNf8UTNg@mail.gmail.com/T/#u
> > >
> > > Will, perhaps you can try it? I'm pretty sure the setup to reproduce
> > > this is easy enough that I'll be able to reproduce in less than an hour
> > > (export two tmpfs [sequential inode number fs] wthin the same 9p mount
> > > in qemu without 'multidevs=remap'), but I don't even have that time
> > > right now.
> > > 
> > > (I didn't even read the patch properly and it might not help at all,
> > > sorry in this case)
> > 
> > I think this patch landed upsteam as d05dcfdf5e16 (" fs/9p: mitigate
> > inode collisions") and so I can confirm that it doesn't help with the
> > issue.
> 
> Ugh, yes, sorry I'm blind it's there alright (you even listed it in your
> first post in the list of commits to revert)

Heh, no worries. It has a funny leading space at the start which is
weirdly jarring when you read it!

> > > I'm not sure this really needs to get Linus involved - it's breaking a
> > > server that used to work even if qemu has been printing a warning about
> > > these duplicate qid.path for a while, and the server really is the
> > > better place to remap these inodes as we have no idea of the underlying
> > > device id as far as I know...
> > 
> > FWIW, I'm not using QEMU at all. This is with kvmtool which, for better
> > or worse, prints no such diagnostic and used to be reliable enough with
> > whatever magic the kernel had prior to v6.9.
> 
> Yes, that problem isn't specific to qemu, anything that re-exports
> multiple filesystems using underlying inode numbers directly has this
> risk (e.g. even diod/nfs-ganesha tcp mounts ought to reproduce if they
> don't mangle inodes); I agree we need to handle servers throwing junk at
> us.
> 
> It's easy enough to reproduce with qemu if remapping is disabled as
> expected (I don't hit the VFS warning in d_splice_alias() but the
> behaviour is definitely wrong):
> ----
> # host side
> cd /tmp/linux-test
> mkdir m1 m2
> mount -t tmpfs tmpfs m1
> mount -t tmpfs tmpfs m2
> mkdir m1/dir m2/dir
> echo foo > m1/dir/foo
> echo bar > m2/dir/bar
> 
> # guest side
> # started with -virtfs local,path=/tmp/linux-test,mount_tag=tmp,security_model=mapped-file
> mount -t 9p -o trans=virtio,debug=1 tmp /mnt/t
> 
> ls /mnt/t/m1/dir
> # foo
> ls /mnt/t/m2/dir
> # bar (works ok if directry isn't open)
> 
> # cd to keep first dir's inode alive
> cd /mnt/t/m1/dir
> ls /mnt/t/m2/dir
> # foo (should be bar)
> ----
> 
> This can also be observed with files if mounting with cache=fscache or
> similar (but interestingly even keeping the file open will properly
> disociate both files in default cache=none mode); either way we won't be
> able to properly work with hard links if we assume the server isn't
> reliable; I guess that if someone wants that we'd really need to
> implement some capability negotiation at mount time so the server can
> tell us what is supported and unless that is done assume the server
> cannot reliably assign inodes...
> 
> And if we go that way then we just shouldn't ever look at the inode
> number?... Which seems to pretty much be what the old code was doing in
> the "new" path of v9fs_qid_iget(); the compare function would just
> always say the inodes are different and get a new one...
> The mitigation Eric implemented is similar enough (re-added a 'new'
> parameter, and fail if I_NEW isn't set when new was requested), but it
> looks like v9fs_fid_iget_dotl() isn't called at all when accessing the
> other overlapping directory, so that wasn't effective here as some
> higher level of caching caught the inode first.

Nice, thanks for digging into this. I'm now pretty grateful I ran into
the vfs error rather than whacky filesystem behaviours.

> I've also confirmed reverting the 4 commits listed by Will do fix both
> behaviors (along with a fscache warning when hitting the duplicate inode
> file, but that's expected):
>         724a08450f74 "fs/9p: simplify iget to remove unnecessary paths"
>         11763a8598f8 "fs/9p: fix uaf in in v9fs_stat2inode_dotl"
>         10211b4a23cf "fs/9p: remove redundant pointer v9ses"
>         d05dcfdf5e16 " fs/9p: mitigate inode collisions"
> 
> 
> > I'm happy to test patches if there's anything available, but otherwise
> > the reverts at least get us back to the old behaviour if nobody has time
> > to come up with something better.
> 
> I think that's the sane thing to do, let's first go back to something
> that works and we can try again if/when someone has time - I've
> certainly just spent more time here than I have, and starting over is just
> a matter of reapplying the patches in a local branch so it's not like
> there's anything to lose (afaiu this was intended as code cleanup to
> make the code more maintainable rather than fixing something specific)

Agreed, and I'm happy to test any subsequent patches now that I've got
my Android environment up and running again.

> Thorsten, is there a preferred way reverts should be done?
> In this case it'd probably make sense to squash the 4 reverts in a
> single megarevert? At the very least getting anywhere in the middle with
> the uaf isn't something one would want... And they all made it in 6.9
> together so there's no benefit in splitting them for backport either.
> 
> Once that's decided I'll prepare something to send Linus in a few days,
> I don't think there's much point in sitting on this either...

I'll leave it up to you, but I'd personally do four separate reverts
because it makes the whole process entirely mechanical. I can't recall
seeing a megarevert before.

Will