[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <99c7585c-47c5-9995-3fe6-c75f412b3479@linux.ibm.com>
Date: Tue, 15 Mar 2022 13:01:27 -0400
From: Matthew Rosato <mjrosato@...ux.ibm.com>
To: "Tian, Kevin" <kevin.tian@...el.com>,
Jason Gunthorpe <jgg@...dia.com>,
Alex Williamson <alex.williamson@...hat.com>
Cc: "linux-s390@...r.kernel.org" <linux-s390@...r.kernel.org>,
"cohuck@...hat.com" <cohuck@...hat.com>,
"schnelle@...ux.ibm.com" <schnelle@...ux.ibm.com>,
"farman@...ux.ibm.com" <farman@...ux.ibm.com>,
"pmorel@...ux.ibm.com" <pmorel@...ux.ibm.com>,
"borntraeger@...ux.ibm.com" <borntraeger@...ux.ibm.com>,
"hca@...ux.ibm.com" <hca@...ux.ibm.com>,
"gor@...ux.ibm.com" <gor@...ux.ibm.com>,
"gerald.schaefer@...ux.ibm.com" <gerald.schaefer@...ux.ibm.com>,
"agordeev@...ux.ibm.com" <agordeev@...ux.ibm.com>,
"svens@...ux.ibm.com" <svens@...ux.ibm.com>,
"frankja@...ux.ibm.com" <frankja@...ux.ibm.com>,
"david@...hat.com" <david@...hat.com>,
"imbrenda@...ux.ibm.com" <imbrenda@...ux.ibm.com>,
"vneethv@...ux.ibm.com" <vneethv@...ux.ibm.com>,
"oberpar@...ux.ibm.com" <oberpar@...ux.ibm.com>,
"freude@...ux.ibm.com" <freude@...ux.ibm.com>,
"thuth@...hat.com" <thuth@...hat.com>,
"pasic@...ux.ibm.com" <pasic@...ux.ibm.com>,
"joro@...tes.org" <joro@...tes.org>,
"will@...nel.org" <will@...nel.org>,
"pbonzini@...hat.com" <pbonzini@...hat.com>,
"corbet@....net" <corbet@....net>,
"kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"iommu@...ts.linux-foundation.org" <iommu@...ts.linux-foundation.org>,
"linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>
Subject: Re: [PATCH v4 15/32] vfio: introduce KVM-owned IOMMU type
On 3/15/22 10:17 AM, Matthew Rosato wrote:
> On 3/15/22 3:57 AM, Tian, Kevin wrote:
>>> From: Jason Gunthorpe <jgg@...dia.com>
>>> Sent: Tuesday, March 15, 2022 7:18 AM
>>>
>>> On Mon, Mar 14, 2022 at 04:50:33PM -0600, Alex Williamson wrote:
>>>
>>>>> +/*
>>>>> + * The KVM_IOMMU type implies that the hypervisor will control the
>>> mappings
>>>>> + * rather than userspace
>>>>> + */
>>>>> +#define VFIO_KVM_IOMMU 11
>>>>
>>>> Then why is this hosted in the type1 code that exposes a wide variety
>>>> of userspace interfaces? Thanks,
>>>
>>> It is really badly named, this is the root level of a 2 stage nested
>>> IO page table, and this approach needed a special flag to distinguish
>>> the setup from the normal iommu_domain.
>>>
>>> If we do try to stick this into VFIO it should probably use the
>>> VFIO_TYPE1_NESTING_IOMMU instead - however, we would like to delete
>>> that flag entirely as it was never fully implemented, was never used,
>>> and isn't part of what we are proposing for IOMMU nesting on ARM
>>> anyhow. (So far I've found nobody to explain what the plan here was..)
>>>
>>> This is why I said the second level should be an explicit iommu_domain
>>> all on its own that is explicitly coupled to the KVM to read the page
>>> tables, if necessary.
>>>
>>> But I'm not sure that reading the userspace io page tables with KVM is
>>> even the best thing to do - the iommu driver already has the pinned
>>> memory, it would be faster and more modular to traverse the io page
>>> tables through the pfns in the root iommu_domain than by having KVM do
>>> the translations. Lets see what Matthew says..
>>>
>>
>> Reading this thread it's sort of like an optimization to software
>> nesting.
>
> Yes, we want to avoid breaking to userspace for a very frequent
> operation (RPCIT / updating shadow mappings)
>
>> If that is the case does it make more sense to complete the basic form
>> of software nesting first and then adds this optimization?
>>
>> The basic form would allow the userspace to create a special domain
>> type which points to a user/guest page table (like hardware nesting)
>> but doesn't install the user page table to the IOMMU hardware (unlike
>> hardware nesting). When receiving invalidate cmd from userspace > the
>> iommu driver walks the user page table (1st-level) and the parent
>> page table (2nd-level) to generate a shadow mapping for the
>> invalidated range in the non-nested hardware page table of this
>> special domain type.
>>
>> Once that works what this series does just changes the matter of
>> how the invalidate cmd is triggered. Previously iommu driver receives
>> invalidate cmd from Qemu (via iommufd uAPI) while now receiving
>> the cmd from kvm (via iommufd kAPI) upon interception of RPCIT.
>> From this angle once the connection between iommufd and kvm fd
>> is established there is even no direct talk between iommu driver and
>> kvm.
>
> But something somewhere still needs to be responsible for
> pinning/unpinning of the guest table entries upon each RPCIT
> interception. e.g. the RPCIT intercept can happen because the guest
> wants to invalidate some old mappings or has generated some new mappings
> over a range, so we must shadow the new mappings (by pinning the guest
> entries and placing them in the host hardware table / unpinning
> invalidated ones and clearing their entry in the host hardware table).
>
OK, this got clarified by Jason in another thread: What I was missing
here was an assumption that the 1st-level has already mapped and pinned
all of guest physical address space; in that case there's no need to
invoke pin/unpin operations against a kvm from within the iommu domain
(this series as-is does not pin all of the guest physical address space;
it does pins/unpins on-demand at RPCIT time)
Powered by blists - more mailing lists