lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87tubcbvgk.wl-maz@kernel.org>
Date:   Fri, 01 Apr 2022 17:48:59 +0100
From:   Marc Zyngier <maz@...nel.org>
To:     xieming <xieming@...inos.cn>
Cc:     linux@...linux.org.uk, catalin.marinas@....com, will@...nel.org,
        alex.williamson@...hat.com, sashal@...nel.org,
        kvmarm@...ts.cs.columbia.edu, linux-arm-kernel@...ts.infradead.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] kvm/arm64: fixed passthrough gpu into vm on arm64

Xieming,

This is the second time I fix email addresses for you. Next time, I
simply won't bother replying.

On Fri, 01 Apr 2022 10:08:28 +0100,
xieming <xieming@...inos.cn> wrote:
> 
> when passthrough some pcie device, such as gpus(including
> Nvidia and AMD),kvm will report:"Unsupported FSC: EC=0x24
> xFSC=0x21 ESR_EL2=0x92000061" err.the main reason is vfio

I have asked you to describe how you get there, and you still haven't
bothered replying.

> ioremap vga memory type by DEVICE_nGnRnE, and kvm setting
> memory type to PAGE_S2_DEVICE(DEVICE_nGnRE), but in guestos,
> all of device io memory type when ioremapping (including gpu
> driver TTM memory type) is setting to MT_NORMAL_NC.
> 
> according to ARM64 stage1&stage2 conbining rules.
> memory type attributes combining rules:
> Normal-WB<Normal-WT<NormalNC<Device-GRE<Device-nGRE<
> DevicenGnRE<Device-nGnRnE
> Normal-WB is weakest,Device-nGnRnE is strongest.
> 
> refferring to 'Arm Architecture Reference Manual Armv8,
> for Armv8-A architecture profile' pdf, chapter B2.8
> refferring to 'ARM System Memory Management Unit Architecture
> Specification SMMU architecture version 3.0 and version 3.1' pdf,
> chapter 13.1.5
> 
> therefore, the I/O memory attribute of the VM is setting to
> DevicenGnRE maybe is a mistake. it causes all device memory
> accessing in the virtual machine must be aligned.
> 
> To summarize: stage2 memory type cannot be stronger than stage1
> in arm64 archtechture.

You are plain wrong. It can, and most of the time, it *must*.

> 
> Signed-off-by: xieming <xieming@...inos.cn>
> ---
>  arch/arm/include/asm/kvm_mmu.h        |  3 ++-
>  arch/arm64/include/asm/kvm_mmu.h      |  3 ++-
>  arch/arm64/include/asm/memory.h       |  4 +++-
>  arch/arm64/include/asm/pgtable-prot.h |  2 +-
>  drivers/vfio/pci/vfio_pci.c           |  7 +++++++
>  virt/kvm/arm/mmu.c                    | 19 ++++++++++++++++---
>  virt/kvm/arm/vgic/vgic-v2.c           |  2 +-
>  7 files changed, 32 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> index 523c499e42db..5c7869d25b62 100644
> --- a/arch/arm/include/asm/kvm_mmu.h
> +++ b/arch/arm/include/asm/kvm_mmu.h

This file has been removed from the tree *over two years ago*.

> @@ -64,7 +64,8 @@ void stage2_unmap_vm(struct kvm *kvm);
>  int kvm_alloc_stage2_pgd(struct kvm *kvm);
>  void kvm_free_stage2_pgd(struct kvm *kvm);
>  int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
> -			  phys_addr_t pa, unsigned long size, bool writable);
> +			  phys_addr_t pa, unsigned long size,
> +			  bool writable, bool writecombine);
>  
>  int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run);
>  
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index b2558447c67d..3f98286c7498 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -158,7 +158,8 @@ void stage2_unmap_vm(struct kvm *kvm);
>  int kvm_alloc_stage2_pgd(struct kvm *kvm);
>  void kvm_free_stage2_pgd(struct kvm *kvm);
>  int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
> -			  phys_addr_t pa, unsigned long size, bool writable);
> +			  phys_addr_t pa, unsigned long size,
> +			  bool writable, bool writecombine);

NAK. For a start, there is no such thing as 'write-combine' in the ARM
architecture, and I'm not convinced you can equate WC to Normal-NC.
See the previous discussion at [1].

[1] https://lore.kernel.org/r/20210429162906.32742-1-sdonthineni@nvidia.com

[...]

> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index 51b791c750f1..6f66efb71743 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -1452,7 +1452,14 @@ static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma)
>  	}
>  
>  	vma->vm_private_data = vdev;
> +#ifdef CONFIG_ARM64
> +	if (vfio_pci_is_vga(pdev))
> +		vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
> +	else
> +		vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);

No. That's completely unacceptable. Who says that some VGA (who the
hell implements VGA these days?) implies any sort of attribute other
than device memory? This may work for your particular device under
your own circumstances. Can it be generalised? No.

And as Jason pointed out, this is likely to break userspace.

> +#else
>  	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
> +#endif
>  	vma->vm_pgoff = (pci_resource_start(pdev, index) >> PAGE_SHIFT) + pgoff;
>  
>  	/*
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index 11103b75c596..a46a58696834 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -206,6 +206,17 @@ static inline void kvm_pgd_populate(pgd_t *pgdp, pud_t *pudp)
>  	dsb(ishst);
>  }
>  
> +/**
> + * is_vma_write_combine - check if VMA is mapped with writecombine or not
> + * Return true if VMA mapped with MT_NORMAL_NC otherwise fasle
> + */
> +static inline bool is_vma_write_combine(struct vm_area_struct *vma)
> +{
> +	pteval_t pteval = pgprot_val(vma->vm_page_prot);
> +
> +	return ((pteval & PTE_ATTRINDX_MASK) == PTE_ATTRINDX(MT_NORMAL_NC));
> +}

Again, you are making tons of assumptions here, none of which are
acceptable as is.

	M.

-- 
Without deviation from the norm, progress is not possible.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ