linux-kernel - Re: [PATCH v2 1/1] KVM: arm64: allow the VM to select DEVICE_* and NORMAL

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20231205164318.GG2692119@nvidia.com>
Date:   Tue, 5 Dec 2023 12:43:18 -0400
From:   Jason Gunthorpe <jgg@...dia.com>
To:     Catalin Marinas <catalin.marinas@....com>
Cc:     Marc Zyngier <maz@...nel.org>, ankita@...dia.com,
        Shameerali Kolothum Thodi 
        <shameerali.kolothum.thodi@...wei.com>, oliver.upton@...ux.dev,
        suzuki.poulose@....com, yuzenghui@...wei.com, will@...nel.org,
        ardb@...nel.org, akpm@...ux-foundation.org, gshan@...hat.com,
        aniketa@...dia.com, cjia@...dia.com, kwankhede@...dia.com,
        targupta@...dia.com, vsethi@...dia.com, acurrid@...dia.com,
        apopple@...dia.com, jhubbard@...dia.com, danw@...dia.com,
        mochs@...dia.com, kvmarm@...ts.linux.dev, kvm@...r.kernel.org,
        lpieralisi@...nel.org, linux-kernel@...r.kernel.org,
        linux-arm-kernel@...ts.infradead.org
Subject: Re: [PATCH v2 1/1] KVM: arm64: allow the VM to select DEVICE_* and
 NORMAL_NC for IO memory

On Tue, Dec 05, 2023 at 04:22:33PM +0000, Catalin Marinas wrote:
> Yeah, I made this argument in the past. But it's a fair question to ask
> since the Arm world is different from x86. Just reusing an existing
> driver in a different context may break its expectations. Does Normal NC
> access complete by the time a TLBI (for Stage 2) and DSB (DVMsync) is
> completed? It does reach some point of serialisation with subsequent
> accesses to the same address but not sure how it is ordered with an
> access to a different location like the config space used for reset.
> Maybe it's not a problem at all or it is safe only for PCIe but it would
> be good to get to the bottom of this.

IMHO, the answer is you can't know architecturally. The specific
vfio-platform driver must do an analysis of it's specific SOC and
determine what exactly is required to order the reset. The primary
purpose of the vfio-platform drivers is to provide this reset!

In most cases I would expect some reads from the device to be required
before the reset.

> > Remember, the feedback we got from the CPU architects was that even
> > DEVICE_* will experience an uncontained failure if the device tiggers
> > an error response in shipping ARM IP.
> > 
> > The reason PCIe is safe is because the PCI bridge does not generate
> > errors in the first place!
> 
> That's an argument to restrict this feature to PCIe. It's really about
> fewer arguments on the behaviour of other devices. Marc did raise
> another issue with the GIC VCPU interface (does this even have a vma in
> the host VMM?). That's a class of devices where the mapping is
> context-switched, so the TLBI+DSB rules don't help.

I don't know anything about the GIC VCPU interface, to give any
comment unfortunately. Since it seems there is something to fix here I
would appreciate some background..
 
When you say it is context switched do you mean kvm does a register
write on every vm entry to set the proper HW context for the vCPU?

We are worrying that register write will possibly not order after
NORMAL_NC?

> > Thus, the way a platform device can actually be safe is if it too
> > never generates errors in the first place! Obviously this approach
> > works just as well with NORMAL_NC.
> > 
> > If a platform device does generate errors then we shouldn't expect
> > containment at all, and the memory type has no bearing on the
> > safety. The correct answer is to block these platform devices from
> > VFIO/KVM/etc because they can trigger uncontained failures.
> 
> Assuming the error containment is sorted, there are two other issues
> with other types of devices:
> 
> 1. Ordering guarantees on reclaim or context switch

Solved in VFIO
 
> 2. Unaligned accesses
> 
> On (2), I think PCIe is fairly clear on how the TLPs are generated, so I
> wouldn't expect additional errors here. But I have no idea what AMBA/AXI
> does here in general. Perhaps it's fine, I don't think we looked into it
> as the focus was mostly on PCIe.

I would expect AXI devices to throw errors in all sorts of odd
cases. eg I would not be surprised at all to see carelessly built AXI
devices error if ST64B is pointed at them. At least when I was
building AXI logic years ago I was so lazy :P

This is mostly my point - if the devices under vfio-platform were not
designed to have contained errors then, IMHO, it is hard to believe
that DEVICE_X/NORMAL_NC is the only issue in that HW.

> So, I think it would be easier to get this patch upstream if we limit
> the change to PCIe devices for now. We may relax this further in the
> future. Do you actually have a need for non-PCIe devices to support WC
> in the guest or it's more about the complexity of the logic to detect
> whether it's actually a PCIe BAR we are mapping into the guest? (I can
> see some Arm GPU folk asking for this but those devices are not easily
> virtualisable).

The complexity is my concern, and the disruption to the ecosystem with
some of the ideas given.

If there was a trivial way to convey in the VMA that it is safe then
sure, no objection from me.

My worry is this has turned from fixing a real problem we have today
into a debate about theoretical issues that nobody may care about that
are very disruptive to solve.

I would turn it around and ask we find a way to restrict platform
devices when someone comes with a platform device that wants to use
secure kvm and has a single well defined HW problem that is solved by
this work.

What if we change vfio-pci to use pgprot_device() like it already
really should and say the pgprot_noncached() is enforced as
DEVICE_nGnRnE and pgprot_device() may be DEVICE_nGnRE or NORMAL_NC?
Would that be acceptable?

Jason