[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190513011626.GI24397@joy-OptiPlex-7040>
Date: Sun, 12 May 2019 21:16:26 -0400
From: Yan Zhao <yan.y.zhao@...el.com>
To: Cornelia Huck <cohuck@...hat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@...hat.com>,
Alex Williamson <alex.williamson@...hat.com>,
"intel-gvt-dev@...ts.freedesktop.org"
<intel-gvt-dev@...ts.freedesktop.org>,
"arei.gonglei@...wei.com" <arei.gonglei@...wei.com>,
"aik@...abs.ru" <aik@...abs.ru>,
"Zhengxiao.zx@...baba-inc.com" <Zhengxiao.zx@...baba-inc.com>,
"shuangtai.tst@...baba-inc.com" <shuangtai.tst@...baba-inc.com>,
"qemu-devel@...gnu.org" <qemu-devel@...gnu.org>,
"eauger@...hat.com" <eauger@...hat.com>,
"Liu, Yi L" <yi.l.liu@...el.com>,
"Yang, Ziye" <ziye.yang@...el.com>,
"mlevitsk@...hat.com" <mlevitsk@...hat.com>,
"pasic@...ux.ibm.com" <pasic@...ux.ibm.com>,
"felipe@...anix.com" <felipe@...anix.com>,
"Liu, Changpeng" <changpeng.liu@...el.com>,
"Ken.Xue@....com" <Ken.Xue@....com>,
"jonathan.davies@...anix.com" <jonathan.davies@...anix.com>,
"He, Shaopeng" <shaopeng.he@...el.com>,
"kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"libvir-list@...hat.com" <libvir-list@...hat.com>,
"eskultet@...hat.com" <eskultet@...hat.com>,
"Tian, Kevin" <kevin.tian@...el.com>,
"zhenyuw@...ux.intel.com" <zhenyuw@...ux.intel.com>,
"Wang, Zhi A" <zhi.a.wang@...el.com>,
"cjia@...dia.com" <cjia@...dia.com>,
"kwankhede@...dia.com" <kwankhede@...dia.com>,
"berrange@...hat.com" <berrange@...hat.com>,
"dinechin@...hat.com" <dinechin@...hat.com>
Subject: Re: [PATCH v2 1/2] vfio/mdev: add version attribute for mdev device
On Fri, May 10, 2019 at 05:48:38PM +0800, Cornelia Huck wrote:
> On Fri, 10 May 2019 10:36:09 +0100
> "Dr. David Alan Gilbert" <dgilbert@...hat.com> wrote:
>
> > * Cornelia Huck (cohuck@...hat.com) wrote:
> > > On Thu, 9 May 2019 17:48:26 +0100
> > > "Dr. David Alan Gilbert" <dgilbert@...hat.com> wrote:
> > >
> > > > * Cornelia Huck (cohuck@...hat.com) wrote:
> > > > > On Thu, 9 May 2019 16:48:57 +0100
> > > > > "Dr. David Alan Gilbert" <dgilbert@...hat.com> wrote:
> > > > >
> > > > > > * Cornelia Huck (cohuck@...hat.com) wrote:
> > > > > > > On Tue, 7 May 2019 15:18:26 -0600
> > > > > > > Alex Williamson <alex.williamson@...hat.com> wrote:
> > > > > > >
> > > > > > > > On Sun, 5 May 2019 21:49:04 -0400
> > > > > > > > Yan Zhao <yan.y.zhao@...el.com> wrote:
> > > > > > >
> > > > > > > > > + Errno:
> > > > > > > > > + If vendor driver wants to claim a mdev device incompatible to all other mdev
> > > > > > > > > + devices, it should not register version attribute for this mdev device. But if
> > > > > > > > > + a vendor driver has already registered version attribute and it wants to claim
> > > > > > > > > + a mdev device incompatible to all other mdev devices, it needs to return
> > > > > > > > > + -ENODEV on access to this mdev device's version attribute.
> > > > > > > > > + If a mdev device is only incompatible to certain mdev devices, write of
> > > > > > > > > + incompatible mdev devices's version strings to its version attribute should
> > > > > > > > > + return -EINVAL;
> > > > > > > >
> > > > > > > > I think it's best not to define the specific errno returned for a
> > > > > > > > specific situation, let the vendor driver decide, userspace simply
> > > > > > > > needs to know that an errno on read indicates the device does not
> > > > > > > > support migration version comparison and that an errno on write
> > > > > > > > indicates the devices are incompatible or the target doesn't support
> > > > > > > > migration versions.
> > > > > > >
> > > > > > > I think I have to disagree here: It's probably valuable to have an
> > > > > > > agreed error for 'cannot migrate at all' vs 'cannot migrate between
> > > > > > > those two particular devices'. Userspace might want to do different
> > > > > > > things (e.g. trying with different device pairs).
> > > > > >
> > > > > > Trying to stuff these things down an errno seems a bad idea; we can't
> > > > > > get much information that way.
> > > > >
> > > > > So, what would be a reasonable approach? Userspace should first read
> > > > > the version attributes on both devices (to find out whether migration
> > > > > is supported at all), and only then figure out via writing whether they
> > > > > are compatible?
> > > > >
> > > > > (Or just go ahead and try, if it does not care about the reason.)
> > > >
> > > > Well, I'm OK with something like writing to test whether it's
> > > > compatible, it's just we need a better way of saying 'no'.
> > > > I'm not sure if that involves reading back from somewhere after
> > > > the write or what.
> > >
> > > Hm, so I basically see two ways of doing that:
> > > - standardize on some error codes... problem: error codes can be hard
> > > to fit to reasons
> > > - make the error available in some attribute that can be read
> > >
> > > I'm not sure how we can serialize the readback with the last write,
> > > though (this looks inherently racy).
> > >
> > > How important is detailed error reporting here?
> >
> > I think we need something, otherwise we're just going to get vague
> > user reports of 'but my VM doesn't migrate'; I'd like the error to be
> > good enough to point most users to something they can understand
> > (e.g. wrong card family/too old a driver etc).
>
> Ok, that sounds like a reasonable point. Not that I have a better idea
> how to achieve that, though... we could also log a more verbose error
> message to the kernel log, but that's not necessarily where a user will
> look first.
>
> Ideally, we'd want to have the user space program setting up things
> querying the general compatibility for migration (so that it becomes
> their problem on how to alert the user to problems :), but I'm not sure
> how to eliminate the race between asking the vendor driver for
> compatibility and getting the result of that operation.
>
> Unless we introduce an interface that can retrieve _all_ results
> together with the written value? Or is that not going to be much of a
> problem in practice?
what about defining a migration_errors attribute, storing recent 10 error
records with format like:
input string: error
as identical input strings always have the same error string, the 10 error
records may meet 10+ reason querying operations. And in practice, I think there
wouldn't be 10 simultaneous migration requests?
or could we just define some common errno? like
#define ENOMIGRATION 140 /* device not supporting migration */
#define EUNATCH 49 /* software version not match */
#define EHWNM 142 /* hardware not matching*/
Powered by blists - more mailing lists