[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BL1PR12MB584944749CEE372A1F475C62E785A@BL1PR12MB5849.namprd12.prod.outlook.com>
Date: Tue, 5 Dec 2023 06:46:56 +0000
From: "Chen, Jiqian" <Jiqian.Chen@....com>
To: Roger Pau Monné <roger.pau@...rix.com>,
Stefano Stabellini <sstabellini@...nel.org>
CC: Juergen Gross <jgross@...e.com>,
Oleksandr Tyshchenko <oleksandr_tyshchenko@...m.com>,
Thomas Gleixner <tglx@...utronix.de>,
Boris Ostrovsky <boris.ostrovsky@...cle.com>,
"Rafael J . Wysocki" <rafael@...nel.org>,
Len Brown <lenb@...nel.org>,
Bjorn Helgaas <bhelgaas@...gle.com>,
"xen-devel@...ts.xenproject.org" <xen-devel@...ts.xenproject.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-acpi@...r.kernel.org" <linux-acpi@...r.kernel.org>,
"Stabellini, Stefano" <stefano.stabellini@....com>,
"Deucher, Alexander" <Alexander.Deucher@....com>,
"Koenig, Christian" <Christian.Koenig@....com>,
"Hildebrand, Stewart" <Stewart.Hildebrand@....com>,
"Ragiadakou, Xenia" <Xenia.Ragiadakou@....com>,
"Huang, Honglei1" <Honglei1.Huang@....com>,
"Zhang, Julia" <Julia.Zhang@....com>,
"Huang, Ray" <Ray.Huang@....com>,
"Chen, Jiqian" <Jiqian.Chen@....com>
Subject: Re: [RFC KERNEL PATCH v2 2/3] xen/pvh: Unmask irq for passthrough
device in PVH dom0
On 2023/12/4 18:28, Roger Pau Monné wrote:
> On Fri, Dec 01, 2023 at 07:37:55PM -0800, Stefano Stabellini wrote:
>> On Fri, 1 Dec 2023, Roger Pau Monné wrote:
>>> On Thu, Nov 30, 2023 at 07:15:17PM -0800, Stefano Stabellini wrote:
>>>> On Thu, 30 Nov 2023, Roger Pau Monné wrote:
>>>>> On Wed, Nov 29, 2023 at 07:53:59PM -0800, Stefano Stabellini wrote:
>>>>>> On Fri, 24 Nov 2023, Jiqian Chen wrote:
>>>>>>> This patch is to solve two problems we encountered when we try to
>>>>>>> passthrough a device to hvm domU base on Xen PVH dom0.
>>>>>>>
>>>>>>> First, hvm guest will alloc a pirq and irq for a passthrough device
>>>>>>> by using gsi, before that, the gsi must first has a mapping in dom0,
>>>>>>> see Xen code pci_add_dm_done->xc_domain_irq_permission, it will call
>>>>>>> into Xen and check whether dom0 has the mapping. See
>>>>>>> XEN_DOMCTL_irq_permission->pirq_access_permitted, "current" is PVH
>>>>>>> dom0 and it return irq is 0, and then return -EPERM.
>>>>>>> This is because the passthrough device doesn't do PHYSDEVOP_map_pirq
>>>>>>> when thay are enabled.
>>>>>>>
>>>>>>> Second, in PVH dom0, the gsi of a passthrough device doesn't get
>>>>>>> registered, but gsi must be configured for it to be able to be
>>>>>>> mapped into a domU.
>>>>>>>
>>>>>>> After searching codes, we can find map_pirq and register_gsi will be
>>>>>>> done in function vioapic_write_redirent->vioapic_hwdom_map_gsi when
>>>>>>> the gsi(aka ioapic's pin) is unmasked in PVH dom0. So the problems
>>>>>>> can be conclude to that the gsi of a passthrough device doesn't be
>>>>>>> unmasked.
>>>>>>>
>>>>>>> To solve the unmaske problem, this patch call the unmask_irq when we
>>>>>>> assign a device to be passthrough. So that the gsi can get registered
>>>>>>> and mapped in PVH dom0.
>>>>>>
>>>>>>
>>>>>> Roger, this seems to be more of a Xen issue than a Linux issue. Why do
>>>>>> we need the unmask check in Xen? Couldn't we just do:
>>>>>>
>>>>>>
>>>>>> diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
>>>>>> index 4e40d3609a..df262a4a18 100644
>>>>>> --- a/xen/arch/x86/hvm/vioapic.c
>>>>>> +++ b/xen/arch/x86/hvm/vioapic.c
>>>>>> @@ -287,7 +287,7 @@ static void vioapic_write_redirent(
>>>>>> hvm_dpci_eoi(d, gsi);
>>>>>> }
>>>>>>
>>>>>> - if ( is_hardware_domain(d) && unmasked )
>>>>>> + if ( is_hardware_domain(d) )
>>>>>> {
>>>>>> /*
>>>>>> * NB: don't call vioapic_hwdom_map_gsi while holding hvm.irq_lock
>>>>>
>>>>> There are some issues with this approach.
>>>>>
>>>>> mp_register_gsi() will only setup the trigger and polarity of the
>>>>> IO-APIC pin once, so we do so once the guest unmask the pin in order
>>>>> to assert that the configuration is the intended one. A guest is
>>>>> allowed to write all kind of nonsense stuff to the IO-APIC RTE, but
>>>>> that doesn't take effect unless the pin is unmasked.
>>>>>
>>>>> Overall the question would be whether we have any guarantees that
>>>>> the hardware domain has properly configured the pin, even if it's not
>>>>> using it itself (as it hasn't been unmasked).
>>>>>
>>>>> IIRC PCI legacy interrupts are level triggered and low polarity, so we
>>>>> could configure any pins that are not setup at bind time?
>>>>
>>>> That could work.
>>>>
>>>> Another idea is to move only the call to allocate_and_map_gsi_pirq at
>>>> bind time? That might be enough to pass a pirq_access_permitted check.
>>>
>>> Maybe, albeit that would change the behavior of XEN_DOMCTL_bind_pt_irq
>>> just for PT_IRQ_TYPE_PCI and only when called from a PVH dom0 (as the
>>> parameter would be a GSI instead of a previously mapped IRQ). Such
>>> difference just for PT_IRQ_TYPE_PCI is slightly weird - if we go that
>>> route I would recommend that we instead introduce a new dmop that has
>>> this syntax regardless of the domain type it's called from.
>>
>> Looking at the code it is certainly a bit confusing. My point was that
>> we don't need to wait until polarity and trigger are set appropriately
>> to allow Dom0 to pass successfully a pirq_access_permitted() check. Xen
>> should be able to figure out that Dom0 is permitted pirq access.
>
> The logic is certainly not straightforward, and it could benefit from
> some comments.
>
> The irq permissions are a bit special, in that they get setup when the
> IRQ is mapped.
>
> The problem however is not so much with IRQ permissions, that we can
> indeed sort out internally in Xen. Such check in dom0 has the side
> effect of preventing the IRQ from being assigned to a domU without the
> hardware source being properly configured AFAICT.
>
>>
>> So the idea was to move the call to allocate_and_map_gsi_pirq() earlier
>> somewhere because allocate_and_map_gsi_pirq doesn't require trigger or
>> polarity to be configured to work. But the suggestion of doing it a
>> "bind time" (meaning: XEN_DOMCTL_bind_pt_irq) was a bad idea.
>>
>> But maybe we can find another location, maybe within
>> xen/arch/x86/hvm/vioapic.c, to call allocate_and_map_gsi_pirq() before
>> trigger and polarity are set and before the interrupt is unmasked.
>>
>> Then we change the implementation of vioapic_hwdom_map_gsi to skip the
>> call to allocate_and_map_gsi_pirq, because by the time
>> vioapic_hwdom_map_gsi we assume that allocate_and_map_gsi_pirq had
>> already been done.
>
> But then we would end up in a situation where the
> pirq_access_permitted() check will pass, but the IO-APIC pin won't be
> configured, which I think it's not what we want.
>
> One option would be to allow mp_register_gsi() to be called multiple
> times, and update the IO-APIC pin configuration as long as the pin is
> not unmasked. That would propagate each dom0 RTE update to the
> underlying IO-APIC. However such approach relies on dom0 configuring
> all possible IO-APIC pins, even if no device on dom0 is using them, I
> think it's not a very reliable option.
>
> Another option would be to modify the toolstack to setup the GSI
> itself using the PHYSDEVOP_setup_gsi hypercall. As said in a previous
> email, since we only care about PCI device passthrough the legacy INTx
> should always be level triggered and low polarity.
I am thinking if we can do PHYSDEVOP_map_pirq and PHYSDEVOP_setup_gsi only for passthrough devices(in function pcistub_init_device).
Then it can pass permission check and setup gsi without affecting other devices that do not use IO-APIC pins.
What do you think?
>
>> I am not familiar with vioapic.c but to give you an idea of what I was
>> thinking:
>>
>>
>> diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
>> index 4e40d3609a..16d56fe851 100644
>> --- a/xen/arch/x86/hvm/vioapic.c
>> +++ b/xen/arch/x86/hvm/vioapic.c
>> @@ -189,14 +189,6 @@ static int vioapic_hwdom_map_gsi(unsigned int gsi, unsigned int trig,
>> return ret;
>> }
>>
>> - ret = allocate_and_map_gsi_pirq(currd, pirq, &pirq);
>> - if ( ret )
>> - {
>> - gprintk(XENLOG_WARNING, "vioapic: error mapping GSI %u: %d\n",
>> - gsi, ret);
>> - return ret;
>> - }
>> -
>> pcidevs_lock();
>> ret = pt_irq_create_bind(currd, &pt_irq_bind);
>> if ( ret )
>> @@ -287,6 +279,17 @@ static void vioapic_write_redirent(
>> hvm_dpci_eoi(d, gsi);
>> }
>>
>> + if ( is_hardware_domain(d) )
>> + {
>> + int pirq = gsi, ret;
>> + ret = allocate_and_map_gsi_pirq(currd, pirq, &pirq);
>> + if ( ret )
>> + {
>> + gprintk(XENLOG_WARNING, "vioapic: error mapping GSI %u: %d\n",
>> + gsi, ret);
>> + return ret;
>> + }
>> + }
>> if ( is_hardware_domain(d) && unmasked )
>> {
>> /*
>
> As said above, such approach relies on dom0 writing to the IO-APIC RTE
> of likely each IO-APIC pin, which is IMO not quite reliable. In there
> are two different issues here that need to be fixed for PVH dom0:
>
> - Fix the XEN_DOMCTL_irq_permission pirq_access_permitted() call to
> succeed for a PVH dom0, even if dom0 is not using the GSI itself.
>
> - Configure IO-APIC pins for PCI interrupts even if dom0 is not using
> the IO-APIC pin itself.
>
> First one needs to be fixed internally in Xen, second one will require
> the toolstack to issue an extra hypercall in order to ensure the
> IO-APIC pin is properly configured.
>
> Thanks, Roger.
--
Best regards,
Jiqian Chen.
Powered by blists - more mailing lists