[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0d3f78ba-edff-5e64-2a3a-b2d7ec9b609a@daynix.com>
Date: Fri, 14 Apr 2023 11:51:27 +0900
From: Akihiko Odaki <akihiko.odaki@...nix.com>
To: eric.auger@...hat.com,
Jean-Philippe Brucker <jean-philippe@...aro.org>
Cc: virtio-dev@...ts.oasis-open.org,
virtualization@...ts.linux-foundation.org,
linux-kernel@...r.kernel.org, qemu-devel@...gnu.org
Subject: Re: virtio-iommu hotplug issue
On 2023/04/13 22:39, Eric Auger wrote:
> Hi,
>
> On 4/13/23 13:01, Akihiko Odaki wrote:
>> On 2023/04/13 19:40, Jean-Philippe Brucker wrote:
>>> Hello,
>>>
>>> On Thu, Apr 13, 2023 at 01:49:43PM +0900, Akihiko Odaki wrote:
>>>> Hi,
>>>>
>>>> Recently I encountered a problem with the combination of Linux's
>>>> virtio-iommu driver and QEMU when a SR-IOV virtual function gets
>>>> disabled.
>>>> I'd like to ask you what kind of solution is appropriate here and
>>>> implement
>>>> the solution if possible.
>>>>
>>>> A PCIe device implementing the SR-IOV specification exports a virtual
>>>> function, and the guest can enable or disable it at runtime by
>>>> writing to a
>>>> configuration register. This effectively looks like a PCI device is
>>>> hotplugged for the guest.
>>>
>>> Just so I understand this better: the guest gets a whole PCIe device PF
>>> that implements SR-IOV, and so the guest can dynamically create VFs?
>>> Out
>>> of curiosity, is that a hardware device assigned to the guest with VFIO,
>>> or a device emulated by QEMU?
>>
>> Yes, that's right. The guest can dynamically create and delete VFs.
>> The device is emulated by QEMU: igb, an Intel NIC recently added to
>> QEMU and projected to be released as part of QEMU 8.0.
> From below description In understand you then bind this emulated device
> to VFIO on guest, correct?
Yes, that's correct.
>>
>>>
>>>> In such a case, the kernel assumes the endpoint is
>>>> detached from the virtio-iommu domain, but QEMU actually does not
>>>> detach it.
> The QEMU virtio-iommu device executes commands from the virtio-iommu
> driver and my understanding is the VFIO infra is not in trouble here. As
> suggested by Jean, a detach command probably is missed.
VFIO just illustrates the problem and the origin of the problem is
indeed virtio-iommu.
Regards,
Akihiko Odaki
>>>>
>>>> This inconsistent view of the removed device sometimes prevents the
>>>> VM from
>>>> correctly performing the following procedure, for example:
>>>> 1. Enable a VF.
>>>> 2. Disable the VF.
>>>> 3. Open a vfio container.
>>>> 4. Open the group which the PF belongs to.
>>>> 5. Add the group to the vfio container.
>>>> 6. Map some memory region.
>>>> 7. Close the group.
>>>> 8. Close the vfio container.
>>>> 9. Repeat 3-8
>>>>
>>>> When the VF gets disabled, the kernel assumes the endpoint is
>>>> detached from
>>>> the IOMMU domain, but QEMU actually doesn't detach it. Later, the
>>>> domain
>>>> will be reused in step 3-8.
>>>>
>>>> In step 7, the PF will be detached, and the kernel thinks there is no
>>>> endpoint attached and the mapping the domain holds is cleared, but
>>>> the VF
>>>> endpoint is still attached and the mapping is kept intact.
>>>>
>>>> In step 9, the same domain will be reused again, and the kernel
>>>> requests to
>>>> create a new mapping, but it will conflict with the existing mapping
>>>> and
>>>> result in -EINVAL.
>>>>
>>>> This problem can be fixed by either of:
>>>> - requesting the detachment of the endpoint from the guest when the PCI
>>>> device is unplugged (the VF is disabled)
>>>
>>> Yes, I think this is an issue in the virtio-iommu driver, which
>>> should be
>>> sending a DETACH request when the VF is disabled, likely from
>>> viommu_release_device(). I'll work on a fix unless you would like to
>>> do it
>>
>> It will be nice if you prepare a fix. I will test your patch with my
>> workload if you share it with me.
>
> I can help testing too
>
> Thanks
>
> Eric
>>
>> Regards,
>> Akihiko Odaki
>>
>>>
>>>> - detecting that the PCI device is gone and automatically detach it on
>>>> QEMU-side.
>>>>
>>>> It is not completely clear for me which solution is more appropriate
>>>> as the
>>>> virtio-iommu specification is written in a way independent of the
>>>> endpoint
>>>> mechanism and does not say what should be done when a PCI device is
>>>> unplugged.
>>>
>>> Yes, I'm not sure it's in scope for the specification, it's more about
>>> software guidance
>>>
>>> Thanks,
>>> Jean
>>
>
Powered by blists - more mailing lists