[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8298de1d-648c-bbc6-b3c9-1cbc9b5d7e72@linux.microsoft.com>
Date: Fri, 30 Jan 2026 14:51:19 -0800
From: Mukesh R <mrathor@...ux.microsoft.com>
To: Stanislav Kinsburskii <skinsburskii@...ux.microsoft.com>
Cc: linux-kernel@...r.kernel.org, linux-hyperv@...r.kernel.org,
linux-arm-kernel@...ts.infradead.org, iommu@...ts.linux.dev,
linux-pci@...r.kernel.org, linux-arch@...r.kernel.org, kys@...rosoft.com,
haiyangz@...rosoft.com, wei.liu@...nel.org, decui@...rosoft.com,
longli@...rosoft.com, catalin.marinas@....com, will@...nel.org,
tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
dave.hansen@...ux.intel.com, hpa@...or.com, joro@...tes.org,
lpieralisi@...nel.org, kwilczynski@...nel.org, mani@...nel.org,
robh@...nel.org, bhelgaas@...gle.com, arnd@...db.de,
nunodasneves@...ux.microsoft.com, mhklinux@...look.com,
romank@...ux.microsoft.com
Subject: Re: [PATCH v0 12/15] x86/hyperv: Implement hyperv virtual iommu
On 1/27/26 10:46, Stanislav Kinsburskii wrote:
> On Mon, Jan 26, 2026 at 07:02:29PM -0800, Mukesh R wrote:
>> On 1/26/26 07:57, Stanislav Kinsburskii wrote:
>>> On Fri, Jan 23, 2026 at 05:26:19PM -0800, Mukesh R wrote:
>>>> On 1/20/26 16:12, Stanislav Kinsburskii wrote:
>>>>> On Mon, Jan 19, 2026 at 10:42:27PM -0800, Mukesh R wrote:
>>>>>> From: Mukesh Rathor <mrathor@...ux.microsoft.com>
>>>>>>
>>>>>> Add a new file to implement management of device domains, mapping and
>>>>>> unmapping of iommu memory, and other iommu_ops to fit within the VFIO
>>>>>> framework for PCI passthru on Hyper-V running Linux as root or L1VH
>>>>>> parent. This also implements direct attach mechanism for PCI passthru,
>>>>>> and it is also made to work within the VFIO framework.
>>>>>>
>>>>>> At a high level, during boot the hypervisor creates a default identity
>>>>>> domain and attaches all devices to it. This nicely maps to Linux iommu
>>>>>> subsystem IOMMU_DOMAIN_IDENTITY domain. As a result, Linux does not
>>>>>> need to explicitly ask Hyper-V to attach devices and do maps/unmaps
>>>>>> during boot. As mentioned previously, Hyper-V supports two ways to do
>>>>>> PCI passthru:
>>>>>>
>>>>>> 1. Device Domain: root must create a device domain in the hypervisor,
>>>>>> and do map/unmap hypercalls for mapping and unmapping guest RAM.
>>>>>> All hypervisor communications use device id of type PCI for
>>>>>> identifying and referencing the device.
>>>>>>
>>>>>> 2. Direct Attach: the hypervisor will simply use the guest's HW
>>>>>> page table for mappings, thus the host need not do map/unmap
>>>>>> device memory hypercalls. As such, direct attach passthru setup
>>>>>> during guest boot is extremely fast. A direct attached device
>>>>>> must be referenced via logical device id and not via the PCI
>>>>>> device id.
>>>>>>
>>>>>> At present, L1VH root/parent only supports direct attaches. Also direct
>>>>>> attach is default in non-L1VH cases because there are some significant
>>>>>> performance issues with device domain implementation currently for guests
>>>>>> with higher RAM (say more than 8GB), and that unfortunately cannot be
>>>>>> addressed in the short term.
>>>>>>
>>>>>
>>>>> <snip>
>>>>>
>>>
>>> <snip>
>>>
>>>>>> +static void hv_iommu_detach_dev(struct iommu_domain *immdom, struct device *dev)
>>>>>> +{
>>>>>> + struct pci_dev *pdev;
>>>>>> + struct hv_domain *hvdom = to_hv_domain(immdom);
>>>>>> +
>>>>>> + /* See the attach function, only PCI devices for now */
>>>>>> + if (!dev_is_pci(dev))
>>>>>> + return;
>>>>>> +
>>>>>> + if (hvdom->num_attchd == 0)
>>>>>> + pr_warn("Hyper-V: num_attchd is zero (%s)\n", dev_name(dev));
>>>>>> +
>>>>>> + pdev = to_pci_dev(dev);
>>>>>> +
>>>>>> + if (hvdom->attached_dom) {
>>>>>> + hv_iommu_det_dev_from_guest(hvdom, pdev);
>>>>>> +
>>>>>> + /* Do not reset attached_dom, hv_iommu_unmap_pages happens
>>>>>> + * next.
>>>>>> + */
>>>>>> + } else {
>>>>>> + hv_iommu_det_dev_from_dom(hvdom, pdev);
>>>>>> + }
>>>>>> +
>>>>>> + hvdom->num_attchd--;
>>>>>
>>>>> Shouldn't this be modified iff the detach succeeded?
>>>>
>>>> We want to still free the domain and not let it get stuck. The purpose
>>>> is more to make sure detach was called before domain free.
>>>>
>>>
>>> How can one debug subseqent errors if num_attchd is decremented
>>> unconditionally? In reality the device is left attached, but the related
>>> kernel metadata is gone.
>>
>> Error is printed in case of failed detach. If there is panic, at least
>> you can get some info about the device. Metadata in hypervisor is
>> around if failed.
>>
>
> With this approach the only thing left is a kernel message.
> But if the state is kept intact, one could collect a kernel core and
> analyze it.
Again, most of linux stuff is cleaned up, the only state is in
hypervisor, and hypervisor can totally protect itself and devices.
So there is not much in kernel core as it got cleaned up already.
Think of this as additional check, we can remove in future after
it stands the test of time, until then, every debugging bit helps.
> And note, that there won't be a hypervisor core by default: our main
> context with the usptreamed version of the driver is L1VH and a kernel
> core is the only thing a third party customer can provide for our
> analysis.
Wei can correct me, but we are not only l1vh focused here. There is
work going on on all fronts.
Thanks,
-Mukesh
> Thanks,
> Stanislav
Powered by blists - more mailing lists