[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20231205194822.GL2692119@nvidia.com>
Date: Tue, 5 Dec 2023 15:48:22 -0400
From: Jason Gunthorpe <jgg@...dia.com>
To: Catalin Marinas <catalin.marinas@....com>
Cc: Marc Zyngier <maz@...nel.org>, ankita@...dia.com,
Shameerali Kolothum Thodi
<shameerali.kolothum.thodi@...wei.com>, oliver.upton@...ux.dev,
suzuki.poulose@....com, yuzenghui@...wei.com, will@...nel.org,
ardb@...nel.org, akpm@...ux-foundation.org, gshan@...hat.com,
aniketa@...dia.com, cjia@...dia.com, kwankhede@...dia.com,
targupta@...dia.com, vsethi@...dia.com, acurrid@...dia.com,
apopple@...dia.com, jhubbard@...dia.com, danw@...dia.com,
mochs@...dia.com, kvmarm@...ts.linux.dev, kvm@...r.kernel.org,
lpieralisi@...nel.org, linux-kernel@...r.kernel.org,
linux-arm-kernel@...ts.infradead.org
Subject: Re: [PATCH v2 1/1] KVM: arm64: allow the VM to select DEVICE_* and
NORMAL_NC for IO memory
On Tue, Dec 05, 2023 at 07:24:37PM +0000, Catalin Marinas wrote:
> On Tue, Dec 05, 2023 at 12:43:18PM -0400, Jason Gunthorpe wrote:
> > On Tue, Dec 05, 2023 at 04:22:33PM +0000, Catalin Marinas wrote:
> > > Yeah, I made this argument in the past. But it's a fair question to ask
> > > since the Arm world is different from x86. Just reusing an existing
> > > driver in a different context may break its expectations. Does Normal NC
> > > access complete by the time a TLBI (for Stage 2) and DSB (DVMsync) is
> > > completed? It does reach some point of serialisation with subsequent
> > > accesses to the same address but not sure how it is ordered with an
> > > access to a different location like the config space used for reset.
> > > Maybe it's not a problem at all or it is safe only for PCIe but it would
> > > be good to get to the bottom of this.
> >
> > IMHO, the answer is you can't know architecturally. The specific
> > vfio-platform driver must do an analysis of it's specific SOC and
> > determine what exactly is required to order the reset. The primary
> > purpose of the vfio-platform drivers is to provide this reset!
> >
> > In most cases I would expect some reads from the device to be required
> > before the reset.
>
> I can see in the vfio_platform_common.c code that the reset is either
> handled by an ACPI _RST method or some custom function in case of DT.
> Let's consider the ACPI method for now, I assume the AML code pokes some
> device registers but we can't say much about the ordering it expects
> without knowing the details. The AML may assume that the ioaddr mapped
> as Device-nRnRE (ioremap()) in the kernel has the same attributes
> wherever else is mapped in user or guests. Note that currently the
> vfio_platform and vfio_pci drivers only allow pgprot_noncached() in
> user, so they wouldn't worry about other mismatched aliases.
AML is OS agnostic. It technically shouldn't assume the OS never used
NORMAL_NC to access the device.
By the time the AML is invoked the VMAs are all revoked and all TLBs
are flushed so I think the mismatched alias problem is gone??
> It can be argued that it's the responsibility of whoever grants device
> access to know the details. However, it would help if we give some
> guidance, any expectations broken if an alias is Normal-NC? It's easier
> to start with PCIe first until we get some concrete request for other
> types of devices.
The issue was about NORMAL_NC access prior to the TLBI continuing to
float in the interconnect and not be ordered strictly before the reset
write.
Which, IMHO, is basically arguing that DSB doesn't order these
operations on some specific SOC - which I view with some doubt.
> > The complexity is my concern, and the disruption to the ecosystem with
> > some of the ideas given.
> >
> > If there was a trivial way to convey in the VMA that it is safe then
> > sure, no objection from me.
>
> I suggested a new VM_* flag or some way to probe the iomem_resources for
> PCIe ranges (if they are described in there, not sure). We can invent
> other tree searching for ranges that get registers from the vfio driver,
> I don't think it's that difficult.
I do not like iomem_resource probing on principle :(
A VM_ flag would be fine, but seems wasteful.
> A more complex way is to change vfio to allow Normal mappings and KVM
> would mimic them. You were actually planning to do this for Cacheable
> anyway.
This needs qemu changes, so I very much don't like it.
Cachable is always cachable, it is different thing.
> > I would turn it around and ask we find a way to restrict platform
> > devices when someone comes with a platform device that wants to use
> > secure kvm and has a single well defined HW problem that is solved by
> > this work.
>
> We end up with similar search/validation mechanism, so not sure we gain
> much.
We gain by not doing it, ever. I'm expecting nobody will ever bring it
up.
The vfio-platform drivers all look to be rather old to me, and the
manifestation is, at worst, that a hostile VM that didn't crash the
machine before does crash it now.
I have serious doubts anyone will run into an issue.
> > What if we change vfio-pci to use pgprot_device() like it already
> > really should and say the pgprot_noncached() is enforced as
> > DEVICE_nGnRnE and pgprot_device() may be DEVICE_nGnRE or NORMAL_NC?
> > Would that be acceptable?
>
> pgprot_device() needs to stay as Device, otherwise you'd get speculative
> reads with potential side-effects.
I do not mean to change pgprot_device() I mean to detect the
difference via pgprot_device() vs pgprot_noncached(). They put a
different value in the PTE that we can sense. It is very hacky.
Jason
Powered by blists - more mailing lists