linux-kernel - Re: [RFC PATCH] vdpa: mandate 1.0 device

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACGkMEuvqUL04_wHqSKPUufu5O6ydhytHAx0NryLODNHkEO+Fw@mail.gmail.com>
Date:   Wed, 12 May 2021 17:24:21 +0800
From:   Jason Wang <jasowang@...hat.com>
To:     "Michael S. Tsirkin" <mst@...hat.com>
Cc:     virtualization <virtualization@...ts.linux-foundation.org>,
        linux-kernel <linux-kernel@...r.kernel.org>, parav@...dia.com,
        Eli Cohen <elic@...dia.com>,
        Lingshan Zhu <lingshan.zhu@...el.com>, mapfelba@...hat.com
Subject: Re: [RFC PATCH] vdpa: mandate 1.0 device

On Wed, May 12, 2021 at 3:54 PM Michael S. Tsirkin <mst@...hat.com> wrote:
>
> On Tue, May 11, 2021 at 04:43:13AM -0400, Jason Wang wrote:
> >
> >
> > ----- 原始邮件 -----
> > >
> > > 在 2021/4/21 下午4:03, Michael S. Tsirkin 写道:
> > > > On Wed, Apr 21, 2021 at 03:41:36PM +0800, Jason Wang wrote:
> > > >> 在 2021/4/12 下午5:23, Jason Wang 写道:
> > > >>> 在 2021/4/12 下午5:09, Michael S. Tsirkin 写道:
> > > >>>> On Mon, Apr 12, 2021 at 02:35:07PM +0800, Jason Wang wrote:
> > > >>>>> 在 2021/4/10 上午12:04, Michael S. Tsirkin 写道:
> > > >>>>>> On Fri, Apr 09, 2021 at 12:47:55PM +0800, Jason Wang wrote:
> > > >>>>>>> 在 2021/4/8 下午11:59, Michael S. Tsirkin 写道:
> > > >>>>>>>> On Thu, Apr 08, 2021 at 04:26:48PM +0800, Jason Wang wrote:
> > > >>>>>>>>> This patch mandates 1.0 for vDPA devices. The goal is to have the
> > > >>>>>>>>> semantic of normative statement in the virtio
> > > >>>>>>>>> spec and eliminate the
> > > >>>>>>>>> burden of transitional device for both vDPA bus and vDPA parent.
> > > >>>>>>>>>
> > > >>>>>>>>> uAPI seems fine since all the vDPA parent mandates
> > > >>>>>>>>> VIRTIO_F_ACCESS_PLATFORM which implies 1.0 devices.
> > > >>>>>>>>>
> > > >>>>>>>>> For legacy guests, it can still work since Qemu will mediate when
> > > >>>>>>>>> necessary (e.g doing the endian conversion).
> > > >>>>>>>>>
> > > >>>>>>>>> Signed-off-by: Jason Wang <jasowang@...hat.com>
> > > >>>>>>>> Hmm. If we do this, don't we still have a problem with
> > > >>>>>>>> legacy drivers which don't ack 1.0?
> > > >>>>>>> Yes, but it's not something that is introduced in this
> > > >>>>>>> commit. The legacy
> > > >>>>>>> driver never work ...
> > > >>>>>> My point is this neither fixes or prevents this.
> > > >>>>>>
> > > >>>>>> So my suggestion is to finally add ioctls along the lines
> > > >>>>>> of PROTOCOL_FEATURES of vhost-user.
> > > >>>>>>
> > > >>>>>> Then that one can have bits for legacy le, legacy be and modern.
> > > >>>>>>
> > > >>>>>> BTW I looked at vhost-user and it does not look like that
> > > >>>>>> has a solution for this problem either, right?
> > > >>>>> Right.
> > > >>>>>
> > > >>>>>
> > > >>>>>>>> Note 1.0 affects ring endianness which is not mediated in QEMU
> > > >>>>>>>> so QEMU can't pretend to device guest is 1.0.
> > > >>>>>>> Right, I plan to send patches to do mediation in the
> > > >>>>>>> Qemu to unbreak legacy
> > > >>>>>>> drivers.
> > > >>>>>>>
> > > >>>>>>> Thanks
> > > >>>>>> I frankly think we'll need PROTOCOL_FEATURES anyway, it's
> > > >>>>>> too useful ...
> > > >>>>>> so why not teach drivers about it and be done with it? You
> > > >>>>>> can't emulate
> > > >>>>>> legacy on modern in a cross endian situation because of vring
> > > >>>>>> endian-ness ...
> > > >>>>> So the problem still. This can only work when the hardware can support
> > > >>>>> legacy vring endian-ness.
> > > >>>>>
> > > >>>>> Consider:
> > > >>>>>
> > > >>>>> 1) the leagcy driver support is non-normative in the spec
> > > >>>>> 2) support a transitional device in the kenrel may requires the
> > > >>>>> hardware
> > > >>>>> support and a burden of kernel codes
> > > >>>>>
> > > >>>>> I'd rather simply drop the legacy driver support
> > > >>>> My point is this patch does not drop legacy support. It merely mandates
> > > >>>> modern support.
> > > >>>
> > > >>> I am not sure I get here. This patch fails the set_feature if VERSION_1
> > > >>> is not negotiated. This means:
> > > >>>
> > > >>> 1) vDPA presents a modern device instead of transitonal device
> > > >>> 2) legacy driver can't be probed
> > > >>>
> > > >>> What I'm missing?
> > > >>
> > > >> Hi Michael:
> > > >>
> > > >> Do you agree to find the way to present modern device? We need a
> > > >> conclusion
> > > >> to make the netlink API work to move forward.
> > > >>
> > > >> Thanks
> > > > I think we need a way to support legacy with no data path overhead. qemu
> > > > setting VERSION_1 for a legacy guest affects the ring format so it does
> > > > not really work. This seems to rule out emulating config space entirely
> > > > in userspace.
> > >
> > >
> > > So I'd rather drop the legacy support in this case. It never work for
> > > vDPA in the past and virtio-vDPA doesn't even need that. Note that
> > > ACCESS_PLATFORM is mandated for all the vDPA parents right now which
> > > implies modern device and LE. I wonder what's the value for supporting
> > > legacy in this case or do we really encourage vendors to ship card with
> > > legacy support (e.g endian support in the hardware)?
> >
> > Hi Michael:
> >
> > Any thoughts on this approach?
> >
> > My understanding is that dropping legacy support will simplify a lot of stuffs.
> >
> > Thanks
>
> So basically the main condition is that strong memory barriers aren't
> needed for virtio, smp barriers are enough.
> Are there architectures besides x86 (where it's kind of true - as long as
> one does not use non-temporals) where that is true?
> If all these architectures are LE then we don't need to worry
> about endian support in the hardware.

So I agree it's better not to add those stuffs in either qemu or
kernel. See below.

>
> In other words I guess yes we could have qemu limit things to x86 and
> then just pretend to the card that it's virtio 1.
> So endian-ness we can address.
>
> Problem is virtio 1 has effects beyond this. things like header size
> with mergeable buffers off for virtio net.
>
> So I am inclined to say let us not do the "pretend it's virtio 1" game
> in qemu.

I fully agree.

  Let us be honest to the card about what happens.
> But if you want to limit things to x86 either in kernel or in qemu,
> that's ok by me.

So what I want to do is:

1) mandate 1.0 device on the kernel
2) don't try to pretend transitional or legacy device on top of modern
device in Qemu, so qemu will fail to start if vhost-vDPA is started
with a legacy or transitional device

And this simply the management API which can assume LE for
pre-configuration via config space.

So if I'm not misunderstanding, we can merge this patch and I can do
the Qemu work on top?

Thanks

>
> >
> > >
> > >
> > > >
> > > > So I think we should add an ioctl along the lines of
> > > > protocol features. Then I think we can reserve feature bits
> > > > for config space format: legacy LE, legacy BE, modern.
> > >
> > >
> > > We had VHOST_SET_VRING_ENDIAN but this will complicates both the vDPA
> > > parent and management. What's more important, legacy behaviour is not
> > > restrictied by the spec.
> > >
> > >
> > > >
> > > > Querying the feature bits will provide us with info about
> > > > what does the device support. Acking them will tell device
> > > > what does guest need.
> > >
> > >
> > > I think this can work, but I wonder how much we can gain from such
> > > complexitiy.
> > >
> > > Thanks
> > >
> > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >>>
> > > >>>>> to have a simple and easy
> > > >>>>> abstarction in the kenrel. For legacy driver in the guest,
> > > >>>>> hypervisor is in
> > > >>>>> charge of the mediation:
> > > >>>>>
> > > >>>>> 1) config space access endian conversion
> > > >>>>> 2) using shadow virtqueue to change the endian in the vring
> > > >>>>>
> > > >>>>> Thanks
> > > >>>> I'd like to avoid shadow virtqueue hacks if at all possible.
> > > >>>> Last I checked performance wasn't much better than just emulating
> > > >>>> virtio in software.
> > > >>>
> > > >>> I think the legacy driver support is just a nice to have. Or do you see
> > > >>> any value to that? I guess for mellanox and intel, only modern device is
> > > >>> supported in the hardware.
> > > >>>
> > > >>> Thanks
> > > >>>
> > > >>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>> ---
> > > >>>>>>>>>      include/linux/vdpa.h | 6 ++++++
> > > >>>>>>>>>      1 file changed, 6 insertions(+)
> > > >>>>>>>>>
> > > >>>>>>>>> diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
> > > >>>>>>>>> index 0fefeb976877..cfde4ec999b4 100644
> > > >>>>>>>>> --- a/include/linux/vdpa.h
> > > >>>>>>>>> +++ b/include/linux/vdpa.h
> > > >>>>>>>>> @@ -6,6 +6,7 @@
> > > >>>>>>>>>      #include <linux/device.h>
> > > >>>>>>>>>      #include <linux/interrupt.h>
> > > >>>>>>>>>      #include <linux/vhost_iotlb.h>
> > > >>>>>>>>> +#include <uapi/linux/virtio_config.h>
> > > >>>>>>>>>      /**
> > > >>>>>>>>>       * vDPA callback definition.
> > > >>>>>>>>> @@ -317,6 +318,11 @@ static inline int
> > > >>>>>>>>> vdpa_set_features(struct vdpa_device *vdev, u64
> > > >>>>>>>>> features)
> > > >>>>>>>>>      {
> > > >>>>>>>>>              const struct vdpa_config_ops *ops = vdev->config;
> > > >>>>>>>>> +        /* Mandating 1.0 to have semantics of
> > > >>>>>>>>> normative statements in
> > > >>>>>>>>> +         * the spec. */
> > > >>>>>>>>> +        if (!(features & BIT_ULL(VIRTIO_F_VERSION_1)))
> > > >>>>>>>>> +        return -EINVAL;
> > > >>>>>>>>> +
> > > >>>>>>>>>          vdev->features_valid = true;
> > > >>>>>>>>>              return ops->set_features(vdev, features);
> > > >>>>>>>>>      }
> > > >>>>>>>>> --
> > > >>>>>>>>> 2.25.1
> > >
> > >
>