[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <67924594-c70e-390e-ce2e-dda41a94ada1@ti.com>
Date: Wed, 16 Sep 2020 17:17:32 +0530
From: Kishon Vijay Abraham I <kishon@...com>
To: Jason Wang <jasowang@...hat.com>, Cornelia Huck <cohuck@...hat.com>
CC: "Michael S. Tsirkin" <mst@...hat.com>,
Ohad Ben-Cohen <ohad@...ery.com>,
Bjorn Andersson <bjorn.andersson@...aro.org>,
Jon Mason <jdmason@...zu.us>,
Dave Jiang <dave.jiang@...el.com>,
Allen Hubbe <allenbh@...il.com>,
Lorenzo Pieralisi <lorenzo.pieralisi@....com>,
Bjorn Helgaas <bhelgaas@...gle.com>,
Paolo Bonzini <pbonzini@...hat.com>,
Stefan Hajnoczi <stefanha@...hat.com>,
Stefano Garzarella <sgarzare@...hat.com>,
<linux-doc@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
<linux-remoteproc@...r.kernel.org>, <linux-ntb@...glegroups.com>,
<linux-pci@...r.kernel.org>, <kvm@...r.kernel.org>,
<virtualization@...ts.linux-foundation.org>,
<netdev@...r.kernel.org>
Subject: Re: [RFC PATCH 00/22] Enhance VHOST to enable SoC-to-SoC
communication
Hi Jason,
On 16/09/20 8:40 am, Jason Wang wrote:
>
> On 2020/9/15 下午11:47, Kishon Vijay Abraham I wrote:
>> Hi Jason,
>>
>> On 15/09/20 1:48 pm, Jason Wang wrote:
>>> Hi Kishon:
>>>
>>> On 2020/9/14 下午3:23, Kishon Vijay Abraham I wrote:
>>>>> Then you need something that is functional equivalent to virtio PCI
>>>>> which is actually the concept of vDPA (e.g vDPA provides
>>>>> alternatives if
>>>>> the queue_sel is hard in the EP implementation).
>>>> Okay, I just tried to compare the 'struct vdpa_config_ops' and 'struct
>>>> vhost_config_ops' ( introduced in [RFC PATCH 03/22] vhost: Add ops for
>>>> the VHOST driver to configure VHOST device).
>>>>
>>>> struct vdpa_config_ops {
>>>> /* Virtqueue ops */
>>>> int (*set_vq_address)(struct vdpa_device *vdev,
>>>> u16 idx, u64 desc_area, u64 driver_area,
>>>> u64 device_area);
>>>> void (*set_vq_num)(struct vdpa_device *vdev, u16 idx, u32 num);
>>>> void (*kick_vq)(struct vdpa_device *vdev, u16 idx);
>>>> void (*set_vq_cb)(struct vdpa_device *vdev, u16 idx,
>>>> struct vdpa_callback *cb);
>>>> void (*set_vq_ready)(struct vdpa_device *vdev, u16 idx, bool
>>>> ready);
>>>> bool (*get_vq_ready)(struct vdpa_device *vdev, u16 idx);
>>>> int (*set_vq_state)(struct vdpa_device *vdev, u16 idx,
>>>> const struct vdpa_vq_state *state);
>>>> int (*get_vq_state)(struct vdpa_device *vdev, u16 idx,
>>>> struct vdpa_vq_state *state);
>>>> struct vdpa_notification_area
>>>> (*get_vq_notification)(struct vdpa_device *vdev, u16 idx);
>>>> /* vq irq is not expected to be changed once DRIVER_OK is set */
>>>> int (*get_vq_irq)(struct vdpa_device *vdv, u16 idx);
>>>>
>>>> /* Device ops */
>>>> u32 (*get_vq_align)(struct vdpa_device *vdev);
>>>> u64 (*get_features)(struct vdpa_device *vdev);
>>>> int (*set_features)(struct vdpa_device *vdev, u64 features);
>>>> void (*set_config_cb)(struct vdpa_device *vdev,
>>>> struct vdpa_callback *cb);
>>>> u16 (*get_vq_num_max)(struct vdpa_device *vdev);
>>>> u32 (*get_device_id)(struct vdpa_device *vdev);
>>>> u32 (*get_vendor_id)(struct vdpa_device *vdev);
>>>> u8 (*get_status)(struct vdpa_device *vdev);
>>>> void (*set_status)(struct vdpa_device *vdev, u8 status);
>>>> void (*get_config)(struct vdpa_device *vdev, unsigned int offset,
>>>> void *buf, unsigned int len);
>>>> void (*set_config)(struct vdpa_device *vdev, unsigned int offset,
>>>> const void *buf, unsigned int len);
>>>> u32 (*get_generation)(struct vdpa_device *vdev);
>>>>
>>>> /* DMA ops */
>>>> int (*set_map)(struct vdpa_device *vdev, struct vhost_iotlb
>>>> *iotlb);
>>>> int (*dma_map)(struct vdpa_device *vdev, u64 iova, u64 size,
>>>> u64 pa, u32 perm);
>>>> int (*dma_unmap)(struct vdpa_device *vdev, u64 iova, u64 size);
>>>>
>>>> /* Free device resources */
>>>> void (*free)(struct vdpa_device *vdev);
>>>> };
>>>>
>>>> +struct vhost_config_ops {
>>>> + int (*create_vqs)(struct vhost_dev *vdev, unsigned int nvqs,
>>>> + unsigned int num_bufs, struct vhost_virtqueue *vqs[],
>>>> + vhost_vq_callback_t *callbacks[],
>>>> + const char * const names[]);
>>>> + void (*del_vqs)(struct vhost_dev *vdev);
>>>> + int (*write)(struct vhost_dev *vdev, u64 vhost_dst, void *src,
>>>> int len);
>>>> + int (*read)(struct vhost_dev *vdev, void *dst, u64 vhost_src, int
>>>> len);
>>>> + int (*set_features)(struct vhost_dev *vdev, u64 device_features);
>>>> + int (*set_status)(struct vhost_dev *vdev, u8 status);
>>>> + u8 (*get_status)(struct vhost_dev *vdev);
>>>> +};
>>>> +
>>>> struct virtio_config_ops
>>>> I think there's some overlap here and some of the ops tries to do the
>>>> same thing.
>>>>
>>>> I think it differs in (*set_vq_address)() and (*create_vqs)().
>>>> [create_vqs() introduced in struct vhost_config_ops provides
>>>> complimentary functionality to (*find_vqs)() in struct
>>>> virtio_config_ops. It seemingly encapsulates the functionality of
>>>> (*set_vq_address)(), (*set_vq_num)(), (*set_vq_cb)(),..].
>>>>
>>>> Back to the difference between (*set_vq_address)() and (*create_vqs)(),
>>>> set_vq_address() directly provides the virtqueue address to the vdpa
>>>> device but create_vqs() only provides the parameters of the virtqueue
>>>> (like the number of virtqueues, number of buffers) but does not
>>>> directly
>>>> provide the address. IMO the backend client drivers (like net or vhost)
>>>> shouldn't/cannot by itself know how to access the vring created on
>>>> virtio front-end. The vdpa device/vhost device should have logic for
>>>> that. That will help the client drivers to work with different types of
>>>> vdpa device/vhost device and can access the vring created by virtio
>>>> irrespective of whether the vring can be accessed via mmio or kernel
>>>> space or user space.
>>>>
>>>> I think vdpa always works with client drivers in userspace and
>>>> providing
>>>> userspace address for vring.
>>>
>>> Sorry for being unclear. What I meant is not replacing vDPA with the
>>> vhost(bus) you proposed but the possibility of replacing virtio-pci-epf
>>> with vDPA in:
>> Okay, so the virtio back-end still use vhost and front end should use
>> vDPA. I see. So the host side PCI driver for EPF should populate
>> vdpa_config_ops and invoke vdpa_register_device().
>
>
> Yes.
>
>
>>> My question is basically for the part of virtio_pci_epf_send_command(),
>>> so it looks to me you have a vendor specific API to replace the
>>> virtio-pci layout of the BAR:
>> Even when we use vDPA, we have to use some sort of
>> virtio_pci_epf_send_command() to communicate with virtio backend right?
>
>
> Right.
>
>
>>
>> Right, the layout is slightly different from the standard layout.
>>
>> This is the layout
>> struct epf_vhost_reg_queue {
>> u8 cmd;
>> u8 cmd_status;
>> u16 status;
>> u16 num_buffers;
>> u16 msix_vector;
>> u64 queue_addr;
>
>
> What's the meaning of queue_addr here?
Using queue_addr, the virtio front-end communicates the address of the
allocated memory for virtqueue to the virtio back-end.
>
> Does not mean the device expects a contiguous memory for avail/desc/used
> ring?
It's contiguous memory. Isn't this similar to other virtio transport
(both PCI legacy and modern interface)?.
>
>
>> } __packed;
>>
>> struct epf_vhost_reg {
>> u64 host_features;
>> u64 guest_features;
>> u16 msix_config;
>> u16 num_queues;
>> u8 device_status;
>> u8 config_generation;
>> u32 isr;
>> u8 cmd;
>> u8 cmd_status;
>> struct epf_vhost_reg_queue vq[MAX_VQS];
>> } __packed;
>>>
>>> +static int virtio_pci_epf_send_command(struct virtio_pci_device
>>> *vp_dev,
>>> + u32 command)
>>> +{
>>> + struct virtio_pci_epf *pci_epf;
>>> + void __iomem *ioaddr;
>>> + ktime_t timeout;
>>> + bool timedout;
>>> + int ret = 0;
>>> + u8 status;
>>> +
>>> + pci_epf = to_virtio_pci_epf(vp_dev);
>>> + ioaddr = vp_dev->ioaddr;
>>> +
>>> + mutex_lock(&pci_epf->lock);
>>> + writeb(command, ioaddr + HOST_CMD);
>>> + timeout = ktime_add_ms(ktime_get(), COMMAND_TIMEOUT);
>>> + while (1) {
>>> + timedout = ktime_after(ktime_get(), timeout);
>>> + status = readb(ioaddr + HOST_CMD_STATUS);
>>> +
>>>
>>> Several questions:
>>>
>>> - It's not clear to me how the synchronization is done between the RC
>>> and EP. E.g how and when the value of HOST_CMD_STATUS can be changed.
>> The HOST_CMD (commands sent to the EP) is serialized by using mutex.
>> Once the EP reads the command, it resets the value in HOST_CMD. So
>> HOST_CMD is less likely an issue.
>
>
> Here's my understanding of the protocol:
>
> 1) RC write to HOST_CMD
> 2) RC wait for HOST_CMD_STATUS to be HOST_CMD_STATUS_OKAY
That's right!
>
> It looks to me what EP should do is
>
> 1) EP reset HOST_CMD after reading new command
That's right! It does.
>
> And it looks to me EP should also reset HOST_CMD_STATUS here?
yeah, that would require RC to send another command to reset the status.
Didn't see it required in the normal scenario but good to add this.
>
> (I thought there should be patch to handle stuffs like this but I didn't
> find it in this series)
This is added in [RFC PATCH 19/22] PCI: endpoint: Add EP function driver
to provide VHOST interface
pci_epf_vhost_cmd_handler() gets commands from RC using "reg->cmd;". On
the EP side, it is local memory access (mapped to BAR memory exposed to
the host) and hence accessed using structure member access.
>
>
>>
>> A sufficiently large time is given for the EP to complete it's operation
>> (1 Sec) where the EP provides the status in HOST_CMD_STATUS. After it
>> expires, HOST_CMD_STATUS_NONE is written to HOST_CMD_STATUS. There could
>> be case where EP updates HOST_CMD_STATUS after RC writes
>> HOST_CMD_STATUS_NONE, but by then HOST has already detected this as
>> failure and error-ed out.
>>
>>> If you still want to introduce a new transport, a virtio spec patch
>>> would be helpful for us to understand the device API.
>> Okay, that should be on https://github.com/oasis-tcs/virtio-spec.git?
>
>
> Yes.
>
>
>>> - You have you vendor specific layout (according to
>>> virtio_pci_epb_table()), so I guess you it's better to have a vendor
>>> specific vDPA driver instead
>> Okay, with vDPA, we are free to define our own layouts.
>
>
> Right, but vDPA have other requirements. E.g it requires the device have
> the ability to save/restore the state (e.g the last_avail_idx).
>
> So it actually depends on what you want. If you don't care about
> userspace drivers and want to have a standard transport, you can still
> go virtio.
okay.
>
>
>>> - The advantage of vendor specific vDPA driver is that it can 1) have
>>> less codes 2) support userspace drivers through vhost-vDPA (instead of
>>> inventing new APIs since we can't use vfio-pci here).
>> I see there's an additional level of indirection from virtio to vDPA and
>> probably no need for spec update but don't exactly see how it'll reduce
>> code.
>
>
> AFAIK you don't need to implement your own setup_vq and del_vq.
>
There should still be some entity that allocates memory for virtqueues
and then communicate this address to the backend.
Maybe I have to look this further.
>
>>
>> For 2, Isn't vhost-vdpa supposed to run on virtio backend?
>
>
> Not currently, vDPA is a superset of virtio (e.g it support virtqueue
> state save/restore). This it should be possible in the future probably.
>
>
>>
>> From a high level, I think I should be able to use vDPA for
>> virtio_pci_epf.c. Would you also suggest using vDPA for ntb_virtio.c?
>> ([RFC PATCH 20/22] NTB: Add a new NTB client driver to implement VIRTIO
>> functionality).
>
>
> I think it's your call. If you want
>
> 1) a well-defined standard virtio transport
> 2) willing to finalize d and maintain the spec
> 3) doesn't care about userspace drivers
IIUC, we can use vDPA (virtio_vdpa.c) but still don't need userspace
drivers right?
>
> You can go with virtio, otherwise vDPA.
Okay, let me see. Thanks for your inputs.
Best Regards,
Kishon
Powered by blists - more mailing lists