linux-kernel - Re: A lingering doubt on PCI-MMIO region of PCI-passthrough-device

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHP4M8Ud_tm+SPmZtnSi1--zf=MTsbvSqDYdAfPdAXUj+Ormkg@mail.gmail.com>
Date: Sat, 20 Dec 2025 18:22:49 +0530
From: Ajay Garg <ajaygargnsit@...il.com>
To: Alex Williamson <alex@...zbot.org>
Cc: QEMU Developers <qemu-devel@...gnu.org>, iommu@...ts.linux-foundation.org, 
	linux-pci@...r.kernel.org, 
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: A lingering doubt on PCI-MMIO region of PCI-passthrough-device

Thanks Alex.

I was/am aware of GPA-ranges backed by mmap'ed HVA-ranges.
On further thought, I think I have all the missing pieces (except one,
as mentioned at last in current email).

I'll list the steps below :

a)
There are three stages :

   * pre-configuration by host/qemu.
   * guest-vm bios.
   * guest-vm kernel.

b)
Host procures following memory-slots (amongst others) via mmap :

  * guest-ram
  * pci-config-space       : via vfio's ioctls' help.
  * pci-bar-mmio-space : via vfio's ioctls' help.

For the above memory-slots,

*
guest-ram physical-address is known (0), so ept-mappings for guest-ram
are set up even before guest-vm begins to boot up.

*
there is no concept of guest-physical-address for pci-config-space.

*
pci-bar-mmio-space physical address is not known yet, so ept-mappings
for pci-bar-mmio-space are not set up (yet).

c)
qemu starts the guest, and guest-vm-bios runs next.

This bios is "owned by qemu", and is "definitely different" from the
host-bios (qemu is an altogether different "hardware"). qemu-bios and
host-bios handle pci bus/enumeration "completely differently".

When the pci-enumeration runs during this guest-vm-bios stage, it
accesses the pci-device config-space (backed on the host by mmap'ed
mappings). Note that guest-kernel is still not in picture.

"OBVIOUSLY", all accesses (reads/writes) to pci-config space go to the
pci-config-space memory-slot (handled purely by qemu-bios code).

Once the guest-vm bios carves out guest-physical-addresses for the
pci-device-bars, it programs the bars by writing to bars-offsets in
the pci-config-space. qemu detects this, and does the following :

   * does not relay the actual-writes to physical bars on the host.
   * since the bar-guest-physical-addresses are now known, so now the
missing ept-mappings
     for pci-bar-mmio-space are now set up.

d)
Finally, guest-kernel takes over, and

   * all accesses to ram go through vanilla two-stages translation.
   * all accesses to pci-bars-mmio go through vanilla two-stages translation.

Requests :

i)
Alex / QEMU-experts : kindly correct me if I am wrong :) till now.

ii)
Once kernel boots up, how are accesses to pci-config-space handled? Is
again qemu-bios involved in pci-config-space accesses after
guest-kernel has booted up?

Once again, many thanks to everyone for their time and help.

Thanks and Regards,
Ajay

On Sat, Dec 20, 2025 at 5:36 AM Alex Williamson <alex@...zbot.org> wrote:
>
> On Fri, 19 Dec 2025 11:53:56 +0530
> Ajay Garg <ajaygargnsit@...il.com> wrote:
>
> > Hi Alex.
> > Kindly help if the steps listed in the previous email are correct.
> >
> > (Have added qemu mailing-list too, as it might be a qemu thing too as
> > virtual-pci is in picture).
> >
> > On Mon, Dec 15, 2025 at 9:20 AM Ajay Garg <ajaygargnsit@...il.com> wrote:
> > >
> > > Thanks Alex.
> > >
> > > So does something like the following happen :
> > >
> > > i)
> > > During bootup, guest starts pci-enumeration as usual.
> > >
> > > ii)
> > > Upon discovering the "passthrough-device", guest carves the physical
> > > MMIO regions (as usual) in the guest's physical-address-space, and
> > > starts-to/attempts to program the BARs with the
> > > guest-physical-base-addresses carved out.
> > >
> > > iii)
> > > These attempts to program the BARs (lying in the
> > > "passthrough-device"'s config-space), are intercepted by the
> > > hypervisor instead (causing a VM-exit in the interim).
> > >
> > > iv)
> > > The hypervisor uses the above info to update the EPT, to ensure GPA =>
> > > HPA conversions go fine when the guest tries to access the PCI-MMIO
> > > regions later (once gurst is fully booted up). Also, the hypervisor
> > > marks the operation as success (without "really" re-programming the
> > > BARs).
> > >
> > > v)
> > > The VM-entry is called, and the guest resumes with the "impression"
> > > that the BARs have been "programmed by guest".
> > >
> > > Is the above sequencing correct at a bird's view level?
>
> It's not far off.  The key is simply that we can create a host virtual
> mapping to the device BARs, ie. an mmap.  The guest enumerates emulated
> BARs, they're only used for sizing and locating the BARs in the guest
> physical address space.  When the guest BAR is programmed and memory
> enabled, the address space in QEMU is populated at the BAR indicated
> GPA using the mmap backing.  KVM memory slots are used to fill the
> mappings in the vCPU.  The same BAR mmap is also used to provide DMA
> mapping of the BAR through the IOMMU in the legacy type1 IOMMU backend
> case.  Barring a vIOMMU, the IOMMU IOVA space is the guest physical
> address space.  Thanks,
>
> Alex