lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8298de1d-648c-bbc6-b3c9-1cbc9b5d7e72@linux.microsoft.com>
Date: Fri, 30 Jan 2026 14:51:19 -0800
From: Mukesh R <mrathor@...ux.microsoft.com>
To: Stanislav Kinsburskii <skinsburskii@...ux.microsoft.com>
Cc: linux-kernel@...r.kernel.org, linux-hyperv@...r.kernel.org,
 linux-arm-kernel@...ts.infradead.org, iommu@...ts.linux.dev,
 linux-pci@...r.kernel.org, linux-arch@...r.kernel.org, kys@...rosoft.com,
 haiyangz@...rosoft.com, wei.liu@...nel.org, decui@...rosoft.com,
 longli@...rosoft.com, catalin.marinas@....com, will@...nel.org,
 tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
 dave.hansen@...ux.intel.com, hpa@...or.com, joro@...tes.org,
 lpieralisi@...nel.org, kwilczynski@...nel.org, mani@...nel.org,
 robh@...nel.org, bhelgaas@...gle.com, arnd@...db.de,
 nunodasneves@...ux.microsoft.com, mhklinux@...look.com,
 romank@...ux.microsoft.com
Subject: Re: [PATCH v0 12/15] x86/hyperv: Implement hyperv virtual iommu

On 1/27/26 10:46, Stanislav Kinsburskii wrote:
> On Mon, Jan 26, 2026 at 07:02:29PM -0800, Mukesh R wrote:
>> On 1/26/26 07:57, Stanislav Kinsburskii wrote:
>>> On Fri, Jan 23, 2026 at 05:26:19PM -0800, Mukesh R wrote:
>>>> On 1/20/26 16:12, Stanislav Kinsburskii wrote:
>>>>> On Mon, Jan 19, 2026 at 10:42:27PM -0800, Mukesh R wrote:
>>>>>> From: Mukesh Rathor <mrathor@...ux.microsoft.com>
>>>>>>
>>>>>> Add a new file to implement management of device domains, mapping and
>>>>>> unmapping of iommu memory, and other iommu_ops to fit within the VFIO
>>>>>> framework for PCI passthru on Hyper-V running Linux as root or L1VH
>>>>>> parent. This also implements direct attach mechanism for PCI passthru,
>>>>>> and it is also made to work within the VFIO framework.
>>>>>>
>>>>>> At a high level, during boot the hypervisor creates a default identity
>>>>>> domain and attaches all devices to it. This nicely maps to Linux iommu
>>>>>> subsystem IOMMU_DOMAIN_IDENTITY domain. As a result, Linux does not
>>>>>> need to explicitly ask Hyper-V to attach devices and do maps/unmaps
>>>>>> during boot. As mentioned previously, Hyper-V supports two ways to do
>>>>>> PCI passthru:
>>>>>>
>>>>>>      1. Device Domain: root must create a device domain in the hypervisor,
>>>>>>         and do map/unmap hypercalls for mapping and unmapping guest RAM.
>>>>>>         All hypervisor communications use device id of type PCI for
>>>>>>         identifying and referencing the device.
>>>>>>
>>>>>>      2. Direct Attach: the hypervisor will simply use the guest's HW
>>>>>>         page table for mappings, thus the host need not do map/unmap
>>>>>>         device memory hypercalls. As such, direct attach passthru setup
>>>>>>         during guest boot is extremely fast. A direct attached device
>>>>>>         must be referenced via logical device id and not via the PCI
>>>>>>         device id.
>>>>>>
>>>>>> At present, L1VH root/parent only supports direct attaches. Also direct
>>>>>> attach is default in non-L1VH cases because there are some significant
>>>>>> performance issues with device domain implementation currently for guests
>>>>>> with higher RAM (say more than 8GB), and that unfortunately cannot be
>>>>>> addressed in the short term.
>>>>>>
>>>>>
>>>>> <snip>
>>>>>
>>>
>>> <snip>
>>>
>>>>>> +static void hv_iommu_detach_dev(struct iommu_domain *immdom, struct device *dev)
>>>>>> +{
>>>>>> +	struct pci_dev *pdev;
>>>>>> +	struct hv_domain *hvdom = to_hv_domain(immdom);
>>>>>> +
>>>>>> +	/* See the attach function, only PCI devices for now */
>>>>>> +	if (!dev_is_pci(dev))
>>>>>> +		return;
>>>>>> +
>>>>>> +	if (hvdom->num_attchd == 0)
>>>>>> +		pr_warn("Hyper-V: num_attchd is zero (%s)\n", dev_name(dev));
>>>>>> +
>>>>>> +	pdev = to_pci_dev(dev);
>>>>>> +
>>>>>> +	if (hvdom->attached_dom) {
>>>>>> +		hv_iommu_det_dev_from_guest(hvdom, pdev);
>>>>>> +
>>>>>> +		/* Do not reset attached_dom, hv_iommu_unmap_pages happens
>>>>>> +		 * next.
>>>>>> +		 */
>>>>>> +	} else {
>>>>>> +		hv_iommu_det_dev_from_dom(hvdom, pdev);
>>>>>> +	}
>>>>>> +
>>>>>> +	hvdom->num_attchd--;
>>>>>
>>>>> Shouldn't this be modified iff the detach succeeded?
>>>>
>>>> We want to still free the domain and not let it get stuck. The purpose
>>>> is more to make sure detach was called before domain free.
>>>>
>>>
>>> How can one debug subseqent errors if num_attchd is decremented
>>> unconditionally? In reality the device is left attached, but the related
>>> kernel metadata is gone.
>>
>> Error is printed in case of failed detach. If there is panic, at least
>> you can get some info about the device. Metadata in hypervisor is
>> around if failed.
>>
> 
> With this approach the only thing left is a kernel message.
> But if the state is kept intact, one could collect a kernel core and
> analyze it.

Again, most of linux stuff is cleaned up, the only state is in
hypervisor, and hypervisor can totally protect itself and devices.
So there is not much in kernel core as it got cleaned up already.
Think of this as additional check, we can remove in future after
it stands the test of time, until then, every debugging bit helps.

> And note, that there won't be a hypervisor core by default: our main
> context with the usptreamed version of the driver is L1VH and a kernel
> core is the only thing a third party customer can provide for our
> analysis.

Wei can correct me, but we are not only l1vh focused here. There is
work going on on all fronts.

Thanks,
-Mukesh

> Thanks,
> Stanislav

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ