[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <3E673A9E-90B1-4A2F-AAAE-E76174AF1B6D@redhat.com>
Date: Wed, 24 Apr 2019 11:10:43 +0200
From: Christophe de Dinechin <dinechin@...hat.com>
To: "Daniel P. Berrangé" <berrange@...hat.com>
Cc: Yan Zhao <yan.y.zhao@...el.com>,
intel-gvt-dev@...ts.freedesktop.org, cjia@...dia.com,
KVM list <kvm@...r.kernel.org>,
Alexey Kardashevskiy <aik@...abs.ru>,
Zhengxiao.zx@...baba-inc.com, shuangtai.tst@...baba-inc.com,
qemu-devel@...gnu.org, Kirti Wankhede <kwankhede@...dia.com>,
eauger@...hat.com, yi.l.liu@...el.com,
Erik Skultety <eskultet@...hat.com>, ziye.yang@...el.com,
mlevitsk@...hat.com, Halil Pasic <pasic@...ux.ibm.com>,
libvir-list@...hat.com, "Gonglei (Arei)" <arei.gonglei@...wei.com>,
felipe@...anix.com, Ken.Xue@....com,
"Tian, Kevin" <kevin.tian@...el.com>,
"Dr. David Alan Gilbert" <dgilbert@...hat.com>,
Zhenyu Wang <zhenyuw@...ux.intel.com>,
Alex Williamson <alex.williamson@...hat.com>,
changpeng.liu@...el.com, Cornelia Huck <cohuck@...hat.com>,
open list <linux-kernel@...r.kernel.org>,
Zhi Wang <zhi.a.wang@...el.com>, jonathan.davies@...anix.com,
shaopeng.he@...el.com
Subject: Re: [Qemu-devel] [PATCH 1/2] vfio/mdev: add version field as
mandatory attribute for mdev device
> On 23 Apr 2019, at 12:39, Daniel P. Berrangé <berrange@...hat.com> wrote:
>
> On Fri, Apr 19, 2019 at 04:35:04AM -0400, Yan Zhao wrote:
>> device version attribute in mdev sysfs is used by user space software
>> (e.g. libvirt) to query device compatibility for live migration of VFIO
>> mdev devices. This attribute is mandatory if a mdev device supports live
>> migration.
>>
>> It consists of two parts: common part and vendor proprietary part.
>> common part: 32 bit. lower 16 bits is vendor id and higher 16 bits
>> identifies device type. e.g., for pci device, it is
>> "pci vendor id" | (VFIO_DEVICE_FLAGS_PCI << 16).
>> vendor proprietary part: this part is varied in length. vendor driver can
>> specify any string to identify a device.
>>
>> When reading this attribute, it should show device version string of the
>> device of type <type-id>. If a device does not support live migration, it
>> should return errno.
>> When writing a string to this attribute, it returns errno for
>> incompatibility or returns written string length in compatibility case.
>> If a device does not support live migration, it always returns errno.
>>
>> For user space software to use:
>> 1.
>> Before starting live migration, user space software first reads source side
>> mdev device's version. e.g.
>> "#cat \
>> /sys/bus/pci/devices/0000\:00\:02.0/5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd/mdev_type/version"
>> 00028086-193b-i915-GVTg_V5_4
>>
>> 2.
>> Then, user space software writes the source side returned version string
>> to device version attribute in target side, and checks the return value.
>> If a negative errno is returned in the target side, then mdev devices in
>> source and target sides are not compatible;
>> If a positive number is returned and it equals to the length of written
>> string, then the two mdev devices in source and target side are compatible.
>> e.g.
>> (a) compatibility case
>> "# echo 00028086-193b-i915-GVTg_V5_4 >
>> /sys/bus/pci/devices/0000\:00\:02.0/882cc4da-dede-11e7-9180-078a62063ab1/mdev_type/version"
>>
>> (b) incompatibility case
>> "#echo 00028086-193b-i915-GVTg_V5_1 >
>> /sys/bus/pci/devices/0000\:00\:02.0/882cc4da-dede-11e7-9180-078a62063ab1/mdev_type/version"
>> -bash: echo: write error: Invalid argument
>
> What you have written here seems to imply that each mdev type is able to
> support many different versions at the same time. Writing a version into
> this sysfs file then chooses which of the many versions to actually use.
>
> This is good as it allows for live migration across driver software upgrades.
>
> A mgmt application may well want to know what versions are supported for an
> mdev type *before* starting a migration. A mgmt app can query all the 100's
> of hosts it knows and thus figure out which are valid to use as the target
> of a migration.
>
> IOW, we want to avoid the ever hitting the incompatibility case in the
> first place, by only choosing to migrate to a host that we know is going
> to be compatible.
>
> This would need some kind of way to report the full list of supported
> versions against the mdev supported types on the host.
>
>
>> 3. if two mdev devices are compatible, user space software can start
>> live migration, and vice versa.
>>
>> Note: if a mdev device does not support live migration, it either does
>> not provide a version attribute, or always returns errno when its version
>> attribute is read/written.
>>
>> Cc: Alex Williamson <alex.williamson@...hat.com>
>> Cc: Erik Skultety <eskultet@...hat.com>
>> Cc: "Dr. David Alan Gilbert" <dgilbert@...hat.com>
>> Cc: Cornelia Huck <cohuck@...hat.com>
>> Cc: "Tian, Kevin" <kevin.tian@...el.com>
>> Cc: Zhenyu Wang <zhenyuw@...ux.intel.com>
>> Cc: "Wang, Zhi A" <zhi.a.wang@...el.com>
>> Cc: Neo Jia <cjia@...dia.com>
>> Cc: Kirti Wankhede <kwankhede@...dia.com>
>>
>> Signed-off-by: Yan Zhao <yan.y.zhao@...el.com>
>> ---
>> Documentation/vfio-mediated-device.txt | 36 ++++++++++++++++++++++++++
>> samples/vfio-mdev/mbochs.c | 17 ++++++++++++
>> samples/vfio-mdev/mdpy.c | 16 ++++++++++++
>> samples/vfio-mdev/mtty.c | 16 ++++++++++++
>> 4 files changed, 85 insertions(+)
>>
>> diff --git a/Documentation/vfio-mediated-device.txt b/Documentation/vfio-mediated-device.txt
>> index c3f69bcaf96e..bc28471c0667 100644
>> --- a/Documentation/vfio-mediated-device.txt
>> +++ b/Documentation/vfio-mediated-device.txt
>> @@ -202,6 +202,7 @@ Directories and files under the sysfs for Each Physical Device
>> | | |--- available_instances
>> | | |--- device_api
>> | | |--- description
>> + | | |--- version
>> | | |--- [devices]
>> | |--- [<type-id>]
>> | | |--- create
>> @@ -209,6 +210,7 @@ Directories and files under the sysfs for Each Physical Device
>> | | |--- available_instances
>> | | |--- device_api
>> | | |--- description
>> + | | |--- version
>> | | |--- [devices]
>> | |--- [<type-id>]
>> | |--- create
>> @@ -216,6 +218,7 @@ Directories and files under the sysfs for Each Physical Device
>> | |--- available_instances
>> | |--- device_api
>> | |--- description
>> + | |--- version
>> | |--- [devices]
>>
>> * [mdev_supported_types]
>> @@ -225,6 +228,8 @@ Directories and files under the sysfs for Each Physical Device
>> [<type-id>], device_api, and available_instances are mandatory attributes
>> that should be provided by vendor driver.
>>
>> + version is a mandatory attribute if a mdev device supports live migration.
>> +
>> * [<type-id>]
>>
>> The [<type-id>] name is created by adding the device driver string as a prefix
>> @@ -246,6 +251,35 @@ Directories and files under the sysfs for Each Physical Device
>> This attribute should show the number of devices of type <type-id> that can be
>> created.
>>
>> +* version
>> +
>> + This attribute is rw. It is used to check whether two devices are compatible
>> + for live migration. If this attribute is missing, then the corresponding mdev
>> + device is regarded as not supporting live migration.
>> +
>> + It consists of two parts: common part and vendor proprietary part.
>> + common part: 32 bit. lower 16 bits is vendor id and higher 16 bits identifies
>> + device type. e.g., for pci device, it is
>> + "pci vendor id" | (VFIO_DEVICE_FLAGS_PCI << 16).
>> + vendor proprietary part: this part is varied in length. vendor driver can
>> + specify any string to identify a device.
>> +
>> + When reading this attribute, it should show device version string of the device
>> + of type <type-id>. If a device does not support live migration, it should
>> + return errno.
>> + When writing a string to this attribute, it returns errno for incompatibility
>> + or returns written string length in compatibility case. If a device does not
>> + support live migration, it always returns errno.
>> +
>> + for example.
>> + # cat \
>> + /sys/bus/pci/devices/0000\:00\:02.0/mdev_supported_types/i915-GVTg_V5_2/version
>> + 00028086-193b-i915-GVTg_V5_2
>> +
>> + #echo 00028086-193b-i915-GVTg_V5_2 > \
>> + /sys/bus/pci/devices/0000\:00\:02.0/mdev_supported_types/i915-GVTg_V5_4/version
>> + -bash: echo: write error: Invalid argument
>> +
>
> IIUC this path is against the physical device. IOW, the mgmt app would have
> to first write to the "version" file to choose a version, and then write to
> the "create" file to actually create an virtual device. This has the obvious
> concurrency problem if multiple devices are being created at the same time
> and distinct versions for each device are required. There would need to be
> a locking scheme defined to ensure safety.
>
> Wouldn't it be better if we can pass the desired version when we write to
> the "create" file, so that we avoid any concurrent usage problems. "version"
> could be just a read-only file with a *list* of supported versions.
>
> eg
>
> $ cat /sys/bus/pci/devices/0000\:00\:02.0/mdev_supported_types/i915-GVTg_V5_4/version
> 5.0
> 5.1
> 5.2
>
> $ echo "83b8f4f2-509f-382f-3c1e-e6bfe0fa1001;version=5.2" >
> /sys/devices/virtual/mtty/mtty/mdev_supported_types/mtty-2/create
>
I read Alex’s comment that creating an mdev with a specific version is not the intent here. However…
- If the goal is just to check compatibility with migration data, then I think the name should be more explicit, e.g. migration_version instead of version. Otherwise, I read the intent the way Daniel did.
- If we want to provide a list of versions and make it possible to create instances based on a version, I would rather use a directory structure for that, i.e. (replacing cat with ls):
$ ls /sys/bus/pci/devices/0000\:00\:02.0/mdev_supported_types/i915-GVTg_V5_4/version
5.0
5.1
5.2
where each version is a directory that has its own available_instances, device_api, description, create, …
$ echo 83b8f4f2-509f-382f-3c1e-e6bfe0fa1001 > /sys/bus/pci/devices/0000\:00\:02.0/mdev_supported_types/i915-GVTg_V5_4/version/5.1/create
In addition to not having the problem of two consecutive writes to distinct files, I can imagine contrived cases where available_instances might depend on the version…
Thanks
Christophe
>
> Regards,
> Daniel
> --
> |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o- https://fstop138.berrange.com :|
> |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
Powered by blists - more mailing lists