linux-kernel - Re: [RFC PATCH] vdpa: mandate 1.0 device

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <c885b2e2-0f90-d215-27c7-02f0d1527991@redhat.com>
Date:   Thu, 3 Jun 2021 15:14:49 +0800
From:   Jason Wang <jasowang@...hat.com>
To:     Eli Cohen <elic@...dia.com>
Cc:     "Michael S. Tsirkin" <mst@...hat.com>,
        virtualization <virtualization@...ts.linux-foundation.org>,
        linux-kernel <linux-kernel@...r.kernel.org>, parav@...dia.com,
        Lingshan Zhu <lingshan.zhu@...el.com>, mapfelba@...hat.com
Subject: Re: [RFC PATCH] vdpa: mandate 1.0 device


在 2021/6/2 下午6:30, Eli Cohen 写道:
> On Wed, May 12, 2021 at 05:24:21PM +0800, Jason Wang wrote:
>
> Michael,
> Did you and Jason came into agreement regarding this?


Probably, let me send a formal patch and see what happens.

Thanks


> Do you think we
> can have the bits in 5.13 and still have time for me to push the vdpa
> too stuff?
>
>
>> On Wed, May 12, 2021 at 3:54 PM Michael S. Tsirkin <mst@...hat.com> wrote:
>>> On Tue, May 11, 2021 at 04:43:13AM -0400, Jason Wang wrote:
>>>>
>>>> ----- 原始邮件 -----
>>>>> 在 2021/4/21 下午4:03, Michael S. Tsirkin 写道:
>>>>>> On Wed, Apr 21, 2021 at 03:41:36PM +0800, Jason Wang wrote:
>>>>>>> 在 2021/4/12 下午5:23, Jason Wang 写道:
>>>>>>>> 在 2021/4/12 下午5:09, Michael S. Tsirkin 写道:
>>>>>>>>> On Mon, Apr 12, 2021 at 02:35:07PM +0800, Jason Wang wrote:
>>>>>>>>>> 在 2021/4/10 上午12:04, Michael S. Tsirkin 写道:
>>>>>>>>>>> On Fri, Apr 09, 2021 at 12:47:55PM +0800, Jason Wang wrote:
>>>>>>>>>>>> 在 2021/4/8 下午11:59, Michael S. Tsirkin 写道:
>>>>>>>>>>>>> On Thu, Apr 08, 2021 at 04:26:48PM +0800, Jason Wang wrote:
>>>>>>>>>>>>>> This patch mandates 1.0 for vDPA devices. The goal is to have the
>>>>>>>>>>>>>> semantic of normative statement in the virtio
>>>>>>>>>>>>>> spec and eliminate the
>>>>>>>>>>>>>> burden of transitional device for both vDPA bus and vDPA parent.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> uAPI seems fine since all the vDPA parent mandates
>>>>>>>>>>>>>> VIRTIO_F_ACCESS_PLATFORM which implies 1.0 devices.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> For legacy guests, it can still work since Qemu will mediate when
>>>>>>>>>>>>>> necessary (e.g doing the endian conversion).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Signed-off-by: Jason Wang <jasowang@...hat.com>
>>>>>>>>>>>>> Hmm. If we do this, don't we still have a problem with
>>>>>>>>>>>>> legacy drivers which don't ack 1.0?
>>>>>>>>>>>> Yes, but it's not something that is introduced in this
>>>>>>>>>>>> commit. The legacy
>>>>>>>>>>>> driver never work ...
>>>>>>>>>>> My point is this neither fixes or prevents this.
>>>>>>>>>>>
>>>>>>>>>>> So my suggestion is to finally add ioctls along the lines
>>>>>>>>>>> of PROTOCOL_FEATURES of vhost-user.
>>>>>>>>>>>
>>>>>>>>>>> Then that one can have bits for legacy le, legacy be and modern.
>>>>>>>>>>>
>>>>>>>>>>> BTW I looked at vhost-user and it does not look like that
>>>>>>>>>>> has a solution for this problem either, right?
>>>>>>>>>> Right.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>> Note 1.0 affects ring endianness which is not mediated in QEMU
>>>>>>>>>>>>> so QEMU can't pretend to device guest is 1.0.
>>>>>>>>>>>> Right, I plan to send patches to do mediation in the
>>>>>>>>>>>> Qemu to unbreak legacy
>>>>>>>>>>>> drivers.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks
>>>>>>>>>>> I frankly think we'll need PROTOCOL_FEATURES anyway, it's
>>>>>>>>>>> too useful ...
>>>>>>>>>>> so why not teach drivers about it and be done with it? You
>>>>>>>>>>> can't emulate
>>>>>>>>>>> legacy on modern in a cross endian situation because of vring
>>>>>>>>>>> endian-ness ...
>>>>>>>>>> So the problem still. This can only work when the hardware can support
>>>>>>>>>> legacy vring endian-ness.
>>>>>>>>>>
>>>>>>>>>> Consider:
>>>>>>>>>>
>>>>>>>>>> 1) the leagcy driver support is non-normative in the spec
>>>>>>>>>> 2) support a transitional device in the kenrel may requires the
>>>>>>>>>> hardware
>>>>>>>>>> support and a burden of kernel codes
>>>>>>>>>>
>>>>>>>>>> I'd rather simply drop the legacy driver support
>>>>>>>>> My point is this patch does not drop legacy support. It merely mandates
>>>>>>>>> modern support.
>>>>>>>> I am not sure I get here. This patch fails the set_feature if VERSION_1
>>>>>>>> is not negotiated. This means:
>>>>>>>>
>>>>>>>> 1) vDPA presents a modern device instead of transitonal device
>>>>>>>> 2) legacy driver can't be probed
>>>>>>>>
>>>>>>>> What I'm missing?
>>>>>>> Hi Michael:
>>>>>>>
>>>>>>> Do you agree to find the way to present modern device? We need a
>>>>>>> conclusion
>>>>>>> to make the netlink API work to move forward.
>>>>>>>
>>>>>>> Thanks
>>>>>> I think we need a way to support legacy with no data path overhead. qemu
>>>>>> setting VERSION_1 for a legacy guest affects the ring format so it does
>>>>>> not really work. This seems to rule out emulating config space entirely
>>>>>> in userspace.
>>>>>
>>>>> So I'd rather drop the legacy support in this case. It never work for
>>>>> vDPA in the past and virtio-vDPA doesn't even need that. Note that
>>>>> ACCESS_PLATFORM is mandated for all the vDPA parents right now which
>>>>> implies modern device and LE. I wonder what's the value for supporting
>>>>> legacy in this case or do we really encourage vendors to ship card with
>>>>> legacy support (e.g endian support in the hardware)?
>>>> Hi Michael:
>>>>
>>>> Any thoughts on this approach?
>>>>
>>>> My understanding is that dropping legacy support will simplify a lot of stuffs.
>>>>
>>>> Thanks
>>> So basically the main condition is that strong memory barriers aren't
>>> needed for virtio, smp barriers are enough.
>>> Are there architectures besides x86 (where it's kind of true - as long as
>>> one does not use non-temporals) where that is true?
>>> If all these architectures are LE then we don't need to worry
>>> about endian support in the hardware.
>> So I agree it's better not to add those stuffs in either qemu or
>> kernel. See below.
>>
>>> In other words I guess yes we could have qemu limit things to x86 and
>>> then just pretend to the card that it's virtio 1.
>>> So endian-ness we can address.
>>>
>>> Problem is virtio 1 has effects beyond this. things like header size
>>> with mergeable buffers off for virtio net.
>>>
>>> So I am inclined to say let us not do the "pretend it's virtio 1" game
>>> in qemu.
>> I fully agree.
>>
>>    Let us be honest to the card about what happens.
>>> But if you want to limit things to x86 either in kernel or in qemu,
>>> that's ok by me.
>> So what I want to do is:
>>
>> 1) mandate 1.0 device on the kernel
>> 2) don't try to pretend transitional or legacy device on top of modern
>> device in Qemu, so qemu will fail to start if vhost-vDPA is started
>> with a legacy or transitional device
>>
>> And this simply the management API which can assume LE for
>> pre-configuration via config space.
>>
>> So if I'm not misunderstanding, we can merge this patch and I can do
>> the Qemu work on top?
>>
>> Thanks
>>
>>>>>
>>>>>> So I think we should add an ioctl along the lines of
>>>>>> protocol features. Then I think we can reserve feature bits
>>>>>> for config space format: legacy LE, legacy BE, modern.
>>>>>
>>>>> We had VHOST_SET_VRING_ENDIAN but this will complicates both the vDPA
>>>>> parent and management. What's more important, legacy behaviour is not
>>>>> restrictied by the spec.
>>>>>
>>>>>
>>>>>> Querying the feature bits will provide us with info about
>>>>>> what does the device support. Acking them will tell device
>>>>>> what does guest need.
>>>>>
>>>>> I think this can work, but I wonder how much we can gain from such
>>>>> complexitiy.
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>>>> to have a simple and easy
>>>>>>>>>> abstarction in the kenrel. For legacy driver in the guest,
>>>>>>>>>> hypervisor is in
>>>>>>>>>> charge of the mediation:
>>>>>>>>>>
>>>>>>>>>> 1) config space access endian conversion
>>>>>>>>>> 2) using shadow virtqueue to change the endian in the vring
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>> I'd like to avoid shadow virtqueue hacks if at all possible.
>>>>>>>>> Last I checked performance wasn't much better than just emulating
>>>>>>>>> virtio in software.
>>>>>>>> I think the legacy driver support is just a nice to have. Or do you see
>>>>>>>> any value to that? I guess for mellanox and intel, only modern device is
>>>>>>>> supported in the hardware.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>       include/linux/vdpa.h | 6 ++++++
>>>>>>>>>>>>>>       1 file changed, 6 insertions(+)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
>>>>>>>>>>>>>> index 0fefeb976877..cfde4ec999b4 100644
>>>>>>>>>>>>>> --- a/include/linux/vdpa.h
>>>>>>>>>>>>>> +++ b/include/linux/vdpa.h
>>>>>>>>>>>>>> @@ -6,6 +6,7 @@
>>>>>>>>>>>>>>       #include <linux/device.h>
>>>>>>>>>>>>>>       #include <linux/interrupt.h>
>>>>>>>>>>>>>>       #include <linux/vhost_iotlb.h>
>>>>>>>>>>>>>> +#include <uapi/linux/virtio_config.h>
>>>>>>>>>>>>>>       /**
>>>>>>>>>>>>>>        * vDPA callback definition.
>>>>>>>>>>>>>> @@ -317,6 +318,11 @@ static inline int
>>>>>>>>>>>>>> vdpa_set_features(struct vdpa_device *vdev, u64
>>>>>>>>>>>>>> features)
>>>>>>>>>>>>>>       {
>>>>>>>>>>>>>>               const struct vdpa_config_ops *ops = vdev->config;
>>>>>>>>>>>>>> +        /* Mandating 1.0 to have semantics of
>>>>>>>>>>>>>> normative statements in
>>>>>>>>>>>>>> +         * the spec. */
>>>>>>>>>>>>>> +        if (!(features & BIT_ULL(VIRTIO_F_VERSION_1)))
>>>>>>>>>>>>>> +        return -EINVAL;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>           vdev->features_valid = true;
>>>>>>>>>>>>>>               return ops->set_features(vdev, features);
>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> 2.25.1
>>>>>