[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <577B77B6.6010600@linux.intel.com>
Date: Tue, 5 Jul 2016 17:02:46 +0800
From: Xiao Guangrong <guangrong.xiao@...ux.intel.com>
To: Neo Jia <cjia@...dia.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>, linux-kernel@...r.kernel.org,
kvm@...r.kernel.org, Kirti Wankhede <kwankhede@...dia.com>,
Andrea Arcangeli <aarcange@...hat.com>,
Radim Krčmář <rkrcmar@...hat.com>
Subject: Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed
On 07/05/2016 03:30 PM, Neo Jia wrote:
>>
>>> (Just for completeness, if you really want to use a device in above example as
>>> VFIO passthru, the second step is not completely handled in userspace, it is actually the guest
>>> driver who will allocate and setup the proper hw resource which will later ready
>>> for you to access via some mmio pages.)
>>
>> Hmm... i always treat the VM as userspace.
>
> It is OK to treat VM as userspace, but I think it is better to put out details
> so we are always on the same page.
>
Okay. I should pay more attention on it when i discuss with the driver people. :)
>>
>>>
>>>>
>>>> This is how QEMU/VFIO currently works, could you please tell me the special points
>>>> of your solution comparing with current QEMU/VFIO and why current model can not fit
>>>> your requirement? So that we can better understand your scenario?
>>>
>>> The scenario I am describing here is mediated passthru case, but what you are
>>> describing here (more or less) is VFIO direct assigned case. It is different in several
>>> areas, but major difference related to this topic here is:
>>>
>>> 1) In VFIO direct assigned case, the device (and its resource) is completely owned by the VM
>>> therefore its mmio region can be mapped directly into the VM during the VFIO mmap() call as
>>> there is no resource sharing among VMs and there is no mediated device driver on
>>> the host to manage such resource, so it is completely owned by the guest.
>>
>> I understand this difference, However, as you told to me that the MMIO region allocated for the
>> VM is continuous, so i assume the portion of physical MMIO region is completely owned by guest.
>> The only difference i can see is mediated device driver need to allocate that region.
>
> It is physically contiguous but it is done during the runtime, physically contiguous doesn't mean
> static partition at boot time. And only during runtime, the proper HW resource will be requested therefore
> the right portion of MMIO region will be granted by the mediated device driver on the host.
Okay. This is your implantation design rather than the hardware limitation, right?
For example, if the instance require 512M memory (the size can be specified by QEMU
command line), it can tell its requirement to the mediated device driver via create()
interface, then the driver can allocate then memory for this instance before it is running.
Theoretically, the hardware is able to do memory management as this style, but for some
reasons you choose allocating memory in the runtime. right? If my understanding is right,
could you please tell us what benefit you want to get from this running-allocation style?
>
> Also, the physically contiguous doesn't mean the guest and host mmio is 1:1
> always. You can have a 8GB host physical mmio while the guest will only have
> 256MB.
Thanks for your patience, it is clearer to me and at least i am able to try to guess the
whole picture. :)
>
>>
>>>
>>> 2) In mediated passthru case, multiple VMs are sharing the same physical device, so how
>>> the HW resource gets allocated is completely decided by the guest and host device driver of
>>> the virtualized DMA device, here is the GPU, same as the MMIO pages used to access those Hw resource.
>>
>> I can not see what guest's affair is here, look at your code, you cooked the fault handler like
>> this:
>
> You shouldn't as that depends on how different devices are getting
> para-virtualized by their own implementations.
>
PV method. It is interesting. More comments below.
>>
>> + ret = parent->ops->validate_map_request(mdev, virtaddr,
>> + &pgoff, &req_size,
>> + &pg_prot);
>>
>> Please tell me what information is got from guest? All these info can be found at the time of
>> mmap().
>
> The virtaddr is the guest mmio address that triggers this fault, which will be
> used for the mediated device driver to locate the resource that he has previously allocated.
The virtaddr is not the guest mmio address, it is the virtual address of QEMU. vfio is not
able to figure out the guest mmio address as the mapping is handled in userspace as we
discussed above.
And we can get the virtaddr from [vma->start, vma->end) when we do mmap().
>
> Then the req_size and pgoff will both come from the mediated device driver based on his internal book
> keeping of the hw resource allocation, which is only available during runtime. And such book keeping
> can be built part of para-virtualization scheme between guest and host device driver.
>
I am talking the parameters you passed to validate_map_request(). req_size is calculated like this:
+ offset = virtaddr - vma->vm_start;
+ phyaddr = (vma->vm_pgoff << PAGE_SHIFT) + offset;
+ pgoff = phyaddr >> PAGE_SHIFT;
All these info is from vma which is available in mmmap().
pgoff is got from:
+ pg_prot = vma->vm_page_prot;
that is also available in mmap().
> None of such information is available at VFIO mmap() time. For example, several VMs
> are sharing the same physical device to provide mediated access. All VMs will
> call the VFIO mmap() on their virtual BAR as part of QEMU vfio/pci initialization
> process, at that moment, we definitely can't mmap the entire physical MMIO
> into both VM blindly for obvious reason.
>
mmap() carries @length information, so you only need to allocate the specified size
(corresponding to @length) of memory for them.
Now i guess there has some operations, e.g, PV operations, between mmap() and memory fault,
these operations tell the mediated device driver how to allocate memory for this instance.
Right?
Powered by blists - more mailing lists