[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d7530c13-f1a1-311e-7d5e-8e65f3bc2e50@redhat.com>
Date: Thu, 20 Apr 2023 16:16:08 +0200
From: Maxime Coquelin <maxime.coquelin@...hat.com>
To: Jason Wang <jasowang@...hat.com>
Cc: xieyongji@...edance.com, mst@...hat.com, david.marchand@...hat.com,
linux-kernel@...r.kernel.org,
virtualization@...ts.linux-foundation.org, netdev@...r.kernel.org,
xuanzhuo@...ux.alibaba.com, eperezma@...hat.com,
Peter Xu <peterx@...hat.com>
Subject: Re: [RFC 0/2] vduse: add support for networking devices
On 4/20/23 06:34, Jason Wang wrote:
> On Wed, Apr 19, 2023 at 9:43 PM Maxime Coquelin
> <maxime.coquelin@...hat.com> wrote:
>>
>> This small series enables virtio-net device type in VDUSE.
>> With it, basic operation have been tested, both with
>> virtio-vdpa and vhost-vdpa using DPDK Vhost library series
>> adding VDUSE support [0] using split rings layout.
>>
>> Control queue support (and so multiqueue) has also been
>> tested, but require a Kernel series from Jason Wang
>> relaxing control queue polling [1] to function reliably.
>>
>> Other than that, we have identified a few gaps:
>>
>> 1. Reconnection:
>> a. VDUSE_VQ_GET_INFO ioctl() returns always 0 for avail
>> index, even after the virtqueue has already been
>> processed. Is that expected? I have tried instead to
>> get the driver's avail index directly from the avail
>> ring, but it does not seem reliable as I sometimes get
>> "id %u is not a head!\n" warnings. Also such solution
>> would not be possible with packed ring, as we need to
>> know the wrap counters values.
>
> Looking at the codes, it only returns the value that is set via
> set_vq_state(). I think it is expected to be called before the
> datapath runs.
>
> So when bound to virtio-vdpa, it is expected to return 0. But we need
> to fix the packed virtqueue case, I wonder if we need to call
> set_vq_state() explicitly in virtio-vdpa before starting the device.
>
> When bound to vhost-vdpa, Qemu will call VHOST_SET_VRING_BASE which
> will end up a call to set_vq_state(). Unfortunately, it doesn't
> support packed ring which needs some extension.
>
>>
>> b. Missing IOCTLs: it would be handy to have new IOCTLs to
>> query Virtio device status,
>
> What's the use case of this ioctl? It looks to me userspace is
> notified on each status change now:
>
> static int vduse_dev_set_status(struct vduse_dev *dev, u8 status)
> {
> struct vduse_dev_msg msg = { 0 };
>
> msg.req.type = VDUSE_SET_STATUS;
> msg.req.s.status = status;
>
> return vduse_dev_msg_sync(dev, &msg);
> }
The idea was to be able to query the status at reconnect time, and
neither having to assume its value nor having to store its value in a
file (the status could change while the VDUSE application is stopped,
but maybe it would receive the notification at reconnect).
I will prototype using a tmpfs file to save needed information, and see
if it works.
>> and retrieve the config
>> space set at VDUSE_CREATE_DEV time.
>
> In order to be safe, VDUSE avoids writable config space. Otherwise
> drivers could block on config writing forever. That's why we don't do
> it now.
The idea was not to make the config space writable, but just to be able
to fetch what was filled at VDUSE_CREATE_DEV time.
With the tmpfs file, we can avoid doing that and just save the config
space there.
> We need to harden the config write before we can proceed to this I think.
>
>>
>> 2. VDUSE application as non-root:
>> We need to run the VDUSE application as non-root. There
>> is some race between the time the UDEV rule is applied
>> and the time the device starts being used. Discussing
>> with Jason, he suggested we may have a VDUSE daemon run
>> as root that would create the VDUSE device, manages its
>> rights and then pass its file descriptor to the VDUSE
>> app. However, with current IOCTLs, it means the VDUSE
>> daemon would need to know several information that
>> belongs to the VDUSE app implementing the device such
>> as supported Virtio features, config space, etc...
>> If we go that route, maybe we should have a control
>> IOCTL to create the device which would just pass the
>> device type. Then another device IOCTL to perform the
>> initialization. Would that make sense?
>
> I think so. We can hear from others.
>
>>
>> 3. Coredump:
>> In order to be able to perform post-mortem analysis, DPDK
>> Vhost library marks pages used for vrings and descriptors
>> buffers as MADV_DODUMP using madvise(). However with
>> VDUSE it fails with -EINVAL. My understanding is that we
>> set VM_DONTEXPAND flag to the VMAs and madvise's
>> MADV_DODUMP fails if it is present. I'm not sure to
>> understand why madvise would prevent MADV_DODUMP if
>> VM_DONTEXPAND is set. Any thoughts?
>
> Adding Peter who may know the answer.
Thanks!
Maxime
> Thanks
>
>>
>> [0]: https://patchwork.dpdk.org/project/dpdk/list/?series=27594&state=%2A&archive=both
>> [1]: https://lore.kernel.org/lkml/CACGkMEtgrxN3PPwsDo4oOsnsSLJfEmBEZ0WvjGRr3whU+QasUg@mail.gmail.com/T/
>>
>> Maxime Coquelin (2):
>> vduse: validate block features only with block devices
>> vduse: enable Virtio-net device type
>>
>> drivers/vdpa/vdpa_user/vduse_dev.c | 11 +++++++----
>> 1 file changed, 7 insertions(+), 4 deletions(-)
>>
>> --
>> 2.39.2
>>
>
Powered by blists - more mailing lists