linux-kernel - Re: [PATCH v2 1/1] KVM: arm64: allow the VM to select DEVICE_* and NORMAL

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZW8lbK88bgdmBgl9@lpieralisi>
Date:   Tue, 5 Dec 2023 14:28:12 +0100
From:   Lorenzo Pieralisi <lpieralisi@...nel.org>
To:     Catalin Marinas <catalin.marinas@....com>
Cc:     Marc Zyngier <maz@...nel.org>, ankita@...dia.com,
        Shameerali Kolothum Thodi 
        <shameerali.kolothum.thodi@...wei.com>, jgg@...dia.com,
        oliver.upton@...ux.dev, suzuki.poulose@....com,
        yuzenghui@...wei.com, will@...nel.org, ardb@...nel.org,
        akpm@...ux-foundation.org, gshan@...hat.com, aniketa@...dia.com,
        cjia@...dia.com, kwankhede@...dia.com, targupta@...dia.com,
        vsethi@...dia.com, acurrid@...dia.com, apopple@...dia.com,
        jhubbard@...dia.com, danw@...dia.com, mochs@...dia.com,
        kvmarm@...ts.linux.dev, kvm@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
        james.morse@....com
Subject: Re: [PATCH v2 1/1] KVM: arm64: allow the VM to select DEVICE_* and
 NORMAL_NC for IO memory

[+James]

On Tue, Dec 05, 2023 at 11:40:47AM +0000, Catalin Marinas wrote:

[...]

> > - Will had unanswered questions in another part of the thread:
> > 
> >   https://lore.kernel.org/all/20231013092954.GB13524@willie-the-truck/
> > 
> >   Can someone please help concluding it?
> 
> Is this about reclaiming the device? I think we concluded that we can't
> generalise this beyond PCIe, though not sure there was any formal
> statement to that thread. The other point Will had was around stating
> in the commit message why we only relax this to Normal NC. I haven't
> checked the commit message yet, it needs careful reading ;).

1) Reclaiming the device: I wrote a section that was reported in this
   commit log hunk:

   "Relaxing S2 KVM device MMIO mappings to Normal-NC is not expected to
   trigger any issue on guest device reclaim use cases either (i.e.
   device MMIO unmap followed by a device reset) at least for PCIe
   devices, in that in PCIe a device reset is architected and carried
   out through PCI config space transactions that are naturally ordered
   with respect to MMIO transactions according to the PCI ordering
   rules."

   It is not too verbose on purpose - I thought it is too
   complicated to express the details in a commit log but
   we can elaborate on that if it is beneficial, probably
   in /Documentation, not in the log itself.

2) On FWB: I added a full section to the log I posted here:

   https://lore.kernel.org/all/ZUz78gFPgMupew+m@lpieralisi

   but I think Ankit trimmed it and I can't certainly blame anyone
   for that, it is a commit log, it is already hard to digest as it is.

I added James because I think that most of the points I made in the logs
are really RAS related (and without him I would not have understood half
of them :)). It would be very beneficial to everyone to have those
added to kernel documentation - especially to express in architectural
terms why this it is a safe change to make (at least for PCIe devices).

I am happy to work on the documentation changes - let's just agree what
should be done next.

Thanks,
Lorenzo

> > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > > index d14504821b79..1cb302457d3f 100644
> > > --- a/arch/arm64/kvm/mmu.c
> > > +++ b/arch/arm64/kvm/mmu.c
> > > @@ -1071,7 +1071,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
> > >  	struct kvm_mmu_memory_cache cache = { .gfp_zero = __GFP_ZERO };
> > >  	struct kvm_s2_mmu *mmu = &kvm->arch.mmu;
> > >  	struct kvm_pgtable *pgt = mmu->pgt;
> > > -	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_DEVICE |
> > > +	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_NORMAL_NC |
> > >  				     KVM_PGTABLE_PROT_R |
> > >  				     (writable ? KVM_PGTABLE_PROT_W : 0);
> > 
> > Doesn't this affect the GICv2 VCPU interface, which is effectively a
> > shared peripheral, now allowing a guest to affect another guest's
> > interrupt distribution? If that is the case, this needs to be fixed.
> > 
> > In general, I don't think this should be a blanket statement, but be
> > limited to devices that we presume can deal with this (i.e. PCIe, and
> > not much else).
> 
> Based on other on-list and off-line discussions, I came to the same
> conclusion that we can't relax this beyond PCIe. How we do this, it's up
> for debate. Some ideas:
> 
> - something in the vfio-pci driver like a new VM_* flag that KVM can
>   consume (hard sell for an arch-specific thing)
> 
> - changing the vfio-pci driver to allow WC and NC mappings (x86
>   terminology) with additional knowledge about what it can safely map
>   as NC. KVM would mimic the vma attributes (it doesn't eliminate the
>   alias though, the guest can always go for Device while the VMM for
>   Normal)
> 
> - some per-SoC pfn ranges that are permitted as Normal NC. Can the arch
>   code figure out where these PCIe BARs are or is it only the vfio-pci
>   driver? I guess we can sort something out around the PCIe but I'm not
>   familiar with this subsystem. The alternative is some "safe" ranges in
>   firmware tables, assuming the firmware configures the PCIe BARs
>   location
> 
> -- 
> Catalin