[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ca6a8e5a-f14d-4017-90dc-be566d594eee@ventanamicro.com>
Date: Mon, 20 Oct 2025 16:43:14 -0300
From: Daniel Henrique Barboza <dbarboza@...tanamicro.com>
To: fangyu.yu@...ux.alibaba.com, anup@...infault.org, atish.patra@...ux.dev,
pjw@...nel.org, palmer@...belt.com, aou@...s.berkeley.edu, alex@...ti.fr,
pbonzini@...hat.com, jiangyifei@...wei.com
Cc: guoren@...nel.org, kvm@...r.kernel.org, kvm-riscv@...ts.infradead.org,
linux-riscv@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] RISC-V: KVM: Remove automatic I/O mapping for VM_PFNMAP
On 10/20/25 10:08 AM, fangyu.yu@...ux.alibaba.com wrote:
> From: Fangyu Yu <fangyu.yu@...ux.alibaba.com>
>
> As of commit aac6db75a9fc ("vfio/pci: Use unmap_mapping_range()"),
> vm_pgoff may no longer guaranteed to hold the PFN for VM_PFNMAP
> regions. Using vma->vm_pgoff to derive the HPA here may therefore
> produce incorrect mappings.
>
> Instead, I/O mappings for such regions can be established on-demand
> during g-stage page faults, making the upfront ioremap in this path
> is unnecessary.
>
> Fixes: 9d05c1fee837 ("RISC-V: KVM: Implement stage2 page table programming")
> Signed-off-by: Fangyu Yu <fangyu.yu@...ux.alibaba.com>
> ---
Hi,
This patch fixes the issue observed by Drew in [1]. I was helping Drew
debug it using a QEMU guest inside an emulated risc-v host with the
'virt' machine + IOMMU enabled.
Using the patches from [2], without the workaround patch (18), booting a
guest with a passed-through PCI device fails with a store amo fault and a
kernel oops:
[ 3.304776] Oops - store (or AMO) access fault [#1]
[ 3.305159] Modules linked in:
[ 3.305603] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.11.0-rc4 #39
[ 3.305988] Hardware name: riscv-virtio,qemu (DT)
[ 3.306140] epc : __ew32+0x34/0xba
[ 3.307910] ra : e1000_irq_disable+0x1e/0x9a
[ 3.307984] epc : ffffffff806ebfbe ra : ffffffff806ee3f8 sp : ff2000000000baf0
[ 3.308022] gp : ffffffff81719938 tp : ff600000018b8000 t0 : ff60000002c3b480
[ 3.308055] t1 : 0000000000000065 t2 : 3030206530303031 s0 : ff2000000000bb30
[ 3.308086] s1 : ff60000002a50a00 a0 : ff60000002a50fb8 a1 : 00000000000000d8
[ 3.308118] a2 : ffffffffffffffff a3 : 0000000000000002 a4 : 0000000000003000
[ 3.308161] a5 : ff200000001e00d8 a6 : 0000000000000008 a7 : 0000000000000038
[ 3.308195] s2 : ff60000002a50fb8 s3 : ff60000001865000 s4 : 00000000000000d8
[ 3.308226] s5 : ffffffffffffffff s6 : ff60000002a50a00 s7 : ffffffff812d2760
[ 3.308258] s8 : 0000000000000a00 s9 : 0000000000001000 s10: ff60000002a51000
[ 3.308288] s11: ff60000002a54000 t3 : ffffffff8172ec4f t4 : ffffffff8172ec4f
[ 3.308475] t5 : ffffffff8172ec50 t6 : ff2000000000b848
[ 3.308763] status: 0000000200000120 badaddr: ff200000001e00d8 cause: 0000000000000007
[ 3.308975] [<ffffffff806ebfbe>] __ew32+0x34/0xba
[ 3.309196] [<ffffffff806ee3f8>] e1000_irq_disable+0x1e/0x9a
[ 3.309241] [<ffffffff806f1e12>] e1000_probe+0x3b6/0xb50
[ 3.309279] [<ffffffff80510554>] pci_device_probe+0x7e/0xf8
[ 3.310001] [<ffffffff80610344>] really_probe+0x82/0x202
[ 3.310409] [<ffffffff80610520>] __driver_probe_device+0x5c/0xd0
[ 3.310622] [<ffffffff806105c0>] driver_probe_device+0x2c/0xb0
(...)
Further debugging showed that, as far as QEMU goes, the store fault happens in an
"unassigned io region", i.e. a region where there's no IO memory region mapped by
any device. There is no IOMMU faults being logged and, at least as far as I've
observed, no IOMMU translation bugs in the QEMU side as well.
Thanks for the fix!
Tested-by: Daniel Henrique Barboza <dbarboza@...tanamicro.com>
[1] https://lore.kernel.org/all/20250920203851.2205115-38-ajones@ventanamicro.com/
[2] https://lore.kernel.org/all/20250920203851.2205115-20-ajones@ventanamicro.com/
> arch/riscv/kvm/mmu.c | 20 +-------------------
> 1 file changed, 1 insertion(+), 19 deletions(-)
>
> diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
> index 525fb5a330c0..84c04c8f0892 100644
> --- a/arch/riscv/kvm/mmu.c
> +++ b/arch/riscv/kvm/mmu.c
> @@ -197,8 +197,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
>
> /*
> * A memory region could potentially cover multiple VMAs, and
> - * any holes between them, so iterate over all of them to find
> - * out if we can map any of them right now.
> + * any holes between them, so iterate over all of them.
> *
> * +--------------------------------------------+
> * +---------------+----------------+ +----------------+
> @@ -229,32 +228,15 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
> vm_end = min(reg_end, vma->vm_end);
>
> if (vma->vm_flags & VM_PFNMAP) {
> - gpa_t gpa = base_gpa + (vm_start - hva);
> - phys_addr_t pa;
> -
> - pa = (phys_addr_t)vma->vm_pgoff << PAGE_SHIFT;
> - pa += vm_start - vma->vm_start;
> -
> /* IO region dirty page logging not allowed */
> if (new->flags & KVM_MEM_LOG_DIRTY_PAGES) {
> ret = -EINVAL;
> goto out;
> }
> -
> - ret = kvm_riscv_mmu_ioremap(kvm, gpa, pa, vm_end - vm_start,
> - writable, false);
> - if (ret)
> - break;
> }
> hva = vm_end;
> } while (hva < reg_end);
>
> - if (change == KVM_MR_FLAGS_ONLY)
> - goto out;
> -
> - if (ret)
> - kvm_riscv_mmu_iounmap(kvm, base_gpa, size);
> -
> out:
> mmap_read_unlock(current->mm);
> return ret;
Powered by blists - more mailing lists