lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 6 Aug 2022 10:42:59 +0100
From:   Will Deacon <will@...nel.org>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     mst@...hat.com, stefanha@...hat.com, jasowang@...hat.com,
        ascull@...gle.com, maz@...nel.org, keirf@...gle.com,
        jiyong@...gle.com, kernel-team@...roid.com,
        linux-kernel@...r.kernel.org,
        virtualization@...ts.linux-foundation.org, kvm@...r.kernel.org,
        Stefano Garzarella <sgarzare@...hat.com>
Subject: Re: IOTLB support for vhost/vsock breaks crosvm on Android

On Fri, Aug 05, 2022 at 03:57:08PM -0700, Linus Torvalds wrote:
> On Fri, Aug 5, 2022 at 11:11 AM Will Deacon <will@...nel.org> wrote:
> >
> > [tl;dr a change from ~18 months ago breaks Android userspace and I don't
> >  know what to do about it]
> 
> Augh.
> 
> I had hoped that android being "closer" to upstream would have meant
> that somebody actually tests android with upstream kernels. People
> occasionally talk about it, but apparently it's not actually done.
> 
> Or maybe it's done onl;y with a very limited android user space.

We do actually test every -rc with Android (and run a whole bunch of
regression tests), this is largely using x86 builds for convenience
but we've been bringing up arm64 recently and are getting increasingly
more coverage there. So this _will_ improve and relatively soon.

The kicker in this case is that we'd only catch it on systems using pKVM
(arm64 host only; upstreaming ongoing) with restricted DMA (requires
device-tree) and so it slipped through. This is made more challenging
for CI because arm64 devices don't tend to have support for nested
virtualisation and so we have to run bare-metal but, as I say, we're
getting there.

> > After some digging, we narrowed this change in behaviour down to
> > e13a6915a03f ("vhost/vsock: add IOTLB API support") and further digging
> > reveals that the infamous VIRTIO_F_ACCESS_PLATFORM feature flag is to
> > blame. Indeed, our tests once again pass if we revert that patch (there's
> > a trivial conflict with the later addition of VIRTIO_VSOCK_F_SEQPACKET
> > but otherwise it reverts cleanly).
> 
> I have to say, this smells for *so* many reasons.
> 
> Why is "IOMMU support" called "VIRTIO_F_ACCESS_PLATFORM"?

It was already renamed once (!) It used to be VIRTIO_F_IOMMU_PLATFORM...

> That seems insane, but seems fundamental in that commit e13a6915a03f
> ("vhost/vsock: add IOTLB API support")
> 
> This code
> 
>         if ((features & (1ULL << VIRTIO_F_ACCESS_PLATFORM))) {
>                 if (vhost_init_device_iotlb(&vsock->dev, true))
>                         goto err;
>         }
> 
> just makes me go "What?"  It makes no sense. Why isn't that feature
> called something-something-IOTLB?
> 
> Can we please just split that flag into two, and have that odd
> "platform access" be one bit, and the "enable iommu" be an entirely
> different bit?

Something along those lines makes sense to me, but it's fiddly because
the bits being used here are part of the virtio spec and we can't freely
allocate them in Linux. I reckon it would probably be better to have a
separate mechanism to enable IOTLB and not repurpose this flag for it.
Hindsight is a wonderful thing.

> And hey, it's possible that the bit encoding is *so* incestuous that
> it's really hard to split it into two. But it really sounds to me like
> somebody mindlessly re-used a feature bit for a *completely* different
> thing. Why?
> 
> Why have feature bits at all, when you then re-use the same bit for
> two different features? It kind of seems to defeat the whole purpose.

No argument here, and it's a big part of the reason I made the effort to
write this up. Yes, we hit this in Android. Yes, we should've hit it
sooner.  But is it specific to Android? No. Anybody wanting a guest to
use the DMA API for its virtio devices is going to be setting this flag
and if they implement the same algorithm as crosvm then they're going to
hit exactly the same problem that we did.

Will

Powered by blists - more mailing lists