[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACGkMEtd_8gu7nMjLFmw7dcXJ0rvsQYiVcUdi3CaY-DBQ4=JZg@mail.gmail.com>
Date: Tue, 9 Aug 2022 11:12:39 +0800
From: Jason Wang <jasowang@...hat.com>
To: "Michael S. Tsirkin" <mst@...hat.com>
Cc: Will Deacon <will@...nel.org>,
Stefan Hajnoczi <stefanha@...hat.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
ascull@...gle.com, Marc Zyngier <maz@...nel.org>,
Keir Fraser <keirf@...gle.com>, jiyong@...gle.com,
kernel-team@...roid.com,
linux-kernel <linux-kernel@...r.kernel.org>,
virtualization <virtualization@...ts.linux-foundation.org>,
kvm <kvm@...r.kernel.org>
Subject: Re: IOTLB support for vhost/vsock breaks crosvm on Android
On Sun, Aug 7, 2022 at 9:14 PM Michael S. Tsirkin <mst@...hat.com> wrote:
>
> Will, thanks very much for the analysis and the writeup!
>
> On Fri, Aug 05, 2022 at 07:11:06PM +0100, Will Deacon wrote:
> > So how should we fix this? One possibility is for us to hack crosvm to
> > clear the VIRTIO_F_ACCESS_PLATFORM flag when setting the vhost features,
> > but others here have reasonably pointed out that they didn't expect a
> > kernel change to break userspace. On the flip side, the offending commit
> > in the kernel isn't exactly new (it's from the end of 2020!) and so it's
> > likely that others (e.g. QEMU) are using this feature.
>
> Exactly, that's the problem.
>
> vhost is reusing the virtio bits and it's only natural that
> what you are doing would happen.
>
> To be precise, this is what we expected people to do (and what QEMU does):
>
>
> #define QEMU_VHOST_FEATURES ((1 << VIRTIO_F_VERSION_1) |
> (1 << VIRTIO_NET_F_RX_MRG) | .... )
>
> VHOST_GET_FEATURES(... &host_features);
> host_features &= QEMU_VHOST_FEATURES
> VHOST_SET_FEATURES(host_features & guest_features)
>
>
> Here QEMU_VHOST_FEATURES are the bits userspace knows about.
>
> Our assumption was that whatever userspace enables, it
> knows what the effect on vhost is going to be.
>
> But yes, I understand absolutely how someone would instead just use the
> guest features. It is unfortunate that we did not catch this in time.
>
>
> In hindsight, we should have just created vhost level macros
> instead of reusing virtio ones. Would address the concern
> about naming: PLATFORM_ACCESS makes sense for the
> guest since there it means "whatever access rules platform has",
> but for vhost a better name would be VHOST_F_IOTLB.
Yes, in the original patch it is called VHOST_F_DEVICE_IOTLB actually
to make it differ from virtio like VHOST_F_LOG_ALL etc. And I remember
I tried to post patch to avoid the bit duplication but the conclusion
is that it's too late for the changes.
> We should have also taken greater pains to document what
> we expect userspace to do. I remember now how I thought about something
> like this but after coding this up in QEMU I forgot to document this :(
> Also, I suspect given the history the GET/SET features ioctl and burned
> wrt extending it and we have to use a new when we add new features.
> All this we can do going forward.
>
>
> But what can we do about the specific issue?
> I am not 100% sure since as Will points out, QEMU and other
> userspace already rely on the current behaviour.
>
> Looking at QEMU specifically, it always sends some translations at
> startup, this in order to handle device rings.
>
> So, *maybe* we can get away with assuming that if no IOTLB ioctl was
> ever invoked then this userspace does not know about IOTLB and
> translation should ignore IOTLB completely.
I think this breaks the security assumptions of some setups.
>
> I am a bit nervous about breaking some *other* userspace which actually
> wants device to be blocked from accessing memory until IOTLB
> has been setup. If we get it wrong we are making guest
> and possibly even host vulnerable.
Yes.
> And of course just revering is not an option either since there
> are now whole stacks depending on the feature.
>
> Will I'd like your input on whether you feel a hack in the kernel
> is justified here.
>
> Also yes, I think it's a good idea to change crosvm anyway. While the
> work around I describe might make sense upstream I don't think it's a
> reasonable thing to do in stable kernels.
+1
Thanks
> I think I'll prepare a patch documenting the legal vhost features
> as a 1st step even though crosvm is rust so it's not importing
> the header directly, right?
>
> --
> MST
>
Powered by blists - more mailing lists