[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b15ae698-cd5e-dfb9-0478-b865cc0c2262@redhat.com>
Date: Thu, 5 Dec 2019 14:40:35 +0800
From: Jason Wang <jasowang@...hat.com>
To: Zhenyu Wang <zhenyuw@...ux.intel.com>,
Parav Pandit <parav@...lanox.com>
Cc: "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"alex.williamson@...hat.com" <alex.williamson@...hat.com>,
"kwankhede@...dia.com" <kwankhede@...dia.com>,
"kevin.tian@...el.com" <kevin.tian@...el.com>,
"cohuck@...hat.com" <cohuck@...hat.com>,
Jiri Pirko <jiri@...lanox.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"Michael S. Tsirkin" <mst@...hat.com>
Subject: Re: [PATCH 0/6] VFIO mdev aggregated resources handling
On 2019/12/5 下午2:06, Zhenyu Wang wrote:
> On 2019.12.04 17:36:12 +0000, Parav Pandit wrote:
>> + Jiri + Netdev since you mentioned netdev queue.
>>
>> + Jason Wang and Michael as we had similar discussion in vdpa discussion thread.
>>
>>> From: Zhenyu Wang <zhenyuw@...ux.intel.com>
>>> Sent: Friday, November 8, 2019 2:19 AM
>>> To: Parav Pandit <parav@...lanox.com>
>>>
>> My apologies to reply late.
>> Something bad with my email client, due to which I found this patch under spam folder today.
>> More comments below.
>>
>>> On 2019.11.07 20:37:49 +0000, Parav Pandit wrote:
>>>> Hi,
>>>>
>>>>> -----Original Message-----
>>>>> From: kvm-owner@...r.kernel.org <kvm-owner@...r.kernel.org> On
>>>>> Behalf Of Zhenyu Wang
>>>>> Sent: Thursday, October 24, 2019 12:08 AM
>>>>> To: kvm@...r.kernel.org
>>>>> Cc: alex.williamson@...hat.com; kwankhede@...dia.com;
>>>>> kevin.tian@...el.com; cohuck@...hat.com
>>>>> Subject: [PATCH 0/6] VFIO mdev aggregated resources handling
>>>>>
>>>>> Hi,
>>>>>
>>>>> This is a refresh for previous send of this series. I got impression
>>>>> that some SIOV drivers would still deploy their own create and
>>>>> config method so stopped effort on this. But seems this would still
>>>>> be useful for some other SIOV driver which may simply want
>>>>> capability to aggregate resources. So here's refreshed series.
>>>>>
>>>>> Current mdev device create interface depends on fixed mdev type,
>>>>> which get uuid from user to create instance of mdev device. If user
>>>>> wants to use customized number of resource for mdev device, then
>>>>> only can create new
>>>> Can you please give an example of 'resource'?
>>>> When I grep [1], [2] and [3], I couldn't find anything related to ' aggregate'.
>>> The resource is vendor device specific, in SIOV spec there's ADI (Assignable
>>> Device Interface) definition which could be e.g queue for net device, context
>>> for gpu, etc. I just named this interface as 'aggregate'
>>> for aggregation purpose, it's not used in spec doc.
>>>
>> Some 'unknown/undefined' vendor specific resource just doesn't work.
>> Orchestration tool doesn't know which resource and what/how to configure for which vendor.
>> It has to be well defined.
>>
>> You can also find such discussion in recent lgpu DRM cgroup patches series v4.
>>
>> Exposing networking resource configuration in non-net namespace aware mdev sysfs at PCI device level is no-go.
>> Adding per file NET_ADMIN or other checks is not the approach we follow in kernel.
>>
>> devlink has been a subsystem though under net, that has very rich interface for syscaller, device health, resource management and many more.
>> Even though it is used by net driver today, its written for generic device management at bus/device level.
>>
>> Yuval has posted patches to manage PCI sub-devices [1] and updated version will be posted soon which addresses comments.
>>
>> For any device slice resource management of mdev, sub-function etc, we should be using single kernel interface as devlink [2], [3].
>>
>> [1] https://lore.kernel.org/netdev/1573229926-30040-1-git-send-email-yuvalav@mellanox.com/
>> [2] http://man7.org/linux/man-pages/man8/devlink-dev.8.html
>> [3] http://man7.org/linux/man-pages/man8/devlink-resource.8.html
>>
>> Most modern device configuration that I am aware of is usually done via well defined ioctl() of the subsystem (vhost, virtio, vfio, rdma, nvme and more) or via netlink commands (net, devlink, rdma and more) not via sysfs.
>>
> Current vfio/mdev configuration is via documented sysfs ABI instead of
> other ways. So this adhere to that way to introduce more configurable
> method on mdev device for standard, it's optional and not actually
> vendor specific e.g vfio-ap.
>
> I'm not sure how many devices support devlink now, or if really make
> sense to utilize devlink for other devices except net, or if really make
> sense to take mdev resource configuration from there...
It may make sense to allow other types of API to manage mdev other than
sysfs. But I'm not sure whether or not it will be a challenge for
orchestration.
Thanks
>>>>> mdev type for that which may not be flexible. This requirement comes
>>>>> not only from to be able to allocate flexible resources for KVMGT,
>>>>> but also from Intel scalable IO virtualization which would use
>>>>> vfio/mdev to be able to allocate arbitrary resources on mdev instance.
>>> More info on [1] [2] [3].
>>>>> To allow to create user defined resources for mdev, it trys to
>>>>> extend mdev create interface by adding new "aggregate=xxx" parameter
>>>>> following UUID, for target mdev type if aggregation is supported, it
>>>>> can create new mdev device which contains resources combined by
>>>>> number of instances, e.g
>>>>>
>>>>> echo "<uuid>,aggregate=10" > create
>>>>>
>>>>> VM manager e.g libvirt can check mdev type with "aggregation"
>>>>> attribute which can support this setting. If no "aggregation"
>>>>> attribute found for mdev type, previous behavior is still kept for
>>>>> one instance allocation. And new sysfs attribute
>>>>> "aggregated_instances" is created for each mdev device to show allocated
>>> number.
>>>>> References:
>>>>> [1]
>>>>> https://software.intel.com/en-us/download/intel-virtualization-techn
>>>>> ology- for-directed-io-architecture-specification
>>>>> [2]
>>>>> https://software.intel.com/en-us/download/intel-scalable-io-virtuali
>>>>> zation-
>>>>> technical-specification
>>>>> [3] https://schd.ws/hosted_files/lc32018/00/LC3-SIOV-final.pdf
>>>>>
>>>>> Zhenyu Wang (6):
>>>>> vfio/mdev: Add new "aggregate" parameter for mdev create
>>>>> vfio/mdev: Add "aggregation" attribute for supported mdev type
>>>>> vfio/mdev: Add "aggregated_instances" attribute for supported mdev
>>>>> device
>>>>> Documentation/driver-api/vfio-mediated-device.rst: Update for
>>>>> vfio/mdev aggregation support
>>>>> Documentation/ABI/testing/sysfs-bus-vfio-mdev: Update for vfio/mdev
>>>>> aggregation support
>>>>> drm/i915/gvt: Add new type with aggregation support
>>>>>
>>>>> Documentation/ABI/testing/sysfs-bus-vfio-mdev | 24 ++++++
>>>>> .../driver-api/vfio-mediated-device.rst | 23 ++++++
>>>>> drivers/gpu/drm/i915/gvt/gvt.c | 4 +-
>>>>> drivers/gpu/drm/i915/gvt/gvt.h | 11 ++-
>>>>> drivers/gpu/drm/i915/gvt/kvmgt.c | 53 ++++++++++++-
>>>>> drivers/gpu/drm/i915/gvt/vgpu.c | 56 ++++++++++++-
>>>>> drivers/vfio/mdev/mdev_core.c | 36 ++++++++-
>>>>> drivers/vfio/mdev/mdev_private.h | 6 +-
>>>>> drivers/vfio/mdev/mdev_sysfs.c | 79 ++++++++++++++++++-
>>>>> include/linux/mdev.h | 19 +++++
>>>>> 10 files changed, 294 insertions(+), 17 deletions(-)
>>>>>
>>>>> --
>>>>> 2.24.0.rc0
>>> --
>>> Open Source Technology Center, Intel ltd.
>>>
>>> $gpg --keyserver wwwkeys.pgp.net --recv-keys 4D781827
Powered by blists - more mailing lists