netdev - Re: [RFC PATCH 00/22] Enhance VHOST to enable SoC-to-SoC communication

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <181ae83d-edeb-9406-27cc-1195fe29ae95@ti.com>
Date:   Tue, 15 Sep 2020 21:17:51 +0530
From:   Kishon Vijay Abraham I <kishon@...com>
To:     Jason Wang <jasowang@...hat.com>, Cornelia Huck <cohuck@...hat.com>
CC:     "Michael S. Tsirkin" <mst@...hat.com>,
        Ohad Ben-Cohen <ohad@...ery.com>,
        Bjorn Andersson <bjorn.andersson@...aro.org>,
        Jon Mason <jdmason@...zu.us>,
        Dave Jiang <dave.jiang@...el.com>,
        Allen Hubbe <allenbh@...il.com>,
        Lorenzo Pieralisi <lorenzo.pieralisi@....com>,
        Bjorn Helgaas <bhelgaas@...gle.com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Stefan Hajnoczi <stefanha@...hat.com>,
        Stefano Garzarella <sgarzare@...hat.com>,
        <linux-doc@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
        <linux-remoteproc@...r.kernel.org>, <linux-ntb@...glegroups.com>,
        <linux-pci@...r.kernel.org>, <kvm@...r.kernel.org>,
        <virtualization@...ts.linux-foundation.org>,
        <netdev@...r.kernel.org>
Subject: Re: [RFC PATCH 00/22] Enhance VHOST to enable SoC-to-SoC
 communication

Hi Jason,

On 15/09/20 1:48 pm, Jason Wang wrote:
> Hi Kishon:
> 
> On 2020/9/14 下午3:23, Kishon Vijay Abraham I wrote:
>>> Then you need something that is functional equivalent to virtio PCI
>>> which is actually the concept of vDPA (e.g vDPA provides alternatives if
>>> the queue_sel is hard in the EP implementation).
>> Okay, I just tried to compare the 'struct vdpa_config_ops' and 'struct
>> vhost_config_ops' ( introduced in [RFC PATCH 03/22] vhost: Add ops for
>> the VHOST driver to configure VHOST device).
>>
>> struct vdpa_config_ops {
>>     /* Virtqueue ops */
>>     int (*set_vq_address)(struct vdpa_device *vdev,
>>                   u16 idx, u64 desc_area, u64 driver_area,
>>                   u64 device_area);
>>     void (*set_vq_num)(struct vdpa_device *vdev, u16 idx, u32 num);
>>     void (*kick_vq)(struct vdpa_device *vdev, u16 idx);
>>     void (*set_vq_cb)(struct vdpa_device *vdev, u16 idx,
>>               struct vdpa_callback *cb);
>>     void (*set_vq_ready)(struct vdpa_device *vdev, u16 idx, bool ready);
>>     bool (*get_vq_ready)(struct vdpa_device *vdev, u16 idx);
>>     int (*set_vq_state)(struct vdpa_device *vdev, u16 idx,
>>                 const struct vdpa_vq_state *state);
>>     int (*get_vq_state)(struct vdpa_device *vdev, u16 idx,
>>                 struct vdpa_vq_state *state);
>>     struct vdpa_notification_area
>>     (*get_vq_notification)(struct vdpa_device *vdev, u16 idx);
>>     /* vq irq is not expected to be changed once DRIVER_OK is set */
>>     int (*get_vq_irq)(struct vdpa_device *vdv, u16 idx);
>>
>>     /* Device ops */
>>     u32 (*get_vq_align)(struct vdpa_device *vdev);
>>     u64 (*get_features)(struct vdpa_device *vdev);
>>     int (*set_features)(struct vdpa_device *vdev, u64 features);
>>     void (*set_config_cb)(struct vdpa_device *vdev,
>>                   struct vdpa_callback *cb);
>>     u16 (*get_vq_num_max)(struct vdpa_device *vdev);
>>     u32 (*get_device_id)(struct vdpa_device *vdev);
>>     u32 (*get_vendor_id)(struct vdpa_device *vdev);
>>     u8 (*get_status)(struct vdpa_device *vdev);
>>     void (*set_status)(struct vdpa_device *vdev, u8 status);
>>     void (*get_config)(struct vdpa_device *vdev, unsigned int offset,
>>                void *buf, unsigned int len);
>>     void (*set_config)(struct vdpa_device *vdev, unsigned int offset,
>>                const void *buf, unsigned int len);
>>     u32 (*get_generation)(struct vdpa_device *vdev);
>>
>>     /* DMA ops */
>>     int (*set_map)(struct vdpa_device *vdev, struct vhost_iotlb *iotlb);
>>     int (*dma_map)(struct vdpa_device *vdev, u64 iova, u64 size,
>>                u64 pa, u32 perm);
>>     int (*dma_unmap)(struct vdpa_device *vdev, u64 iova, u64 size);
>>
>>     /* Free device resources */
>>     void (*free)(struct vdpa_device *vdev);
>> };
>>
>> +struct vhost_config_ops {
>> +    int (*create_vqs)(struct vhost_dev *vdev, unsigned int nvqs,
>> +              unsigned int num_bufs, struct vhost_virtqueue *vqs[],
>> +              vhost_vq_callback_t *callbacks[],
>> +              const char * const names[]);
>> +    void (*del_vqs)(struct vhost_dev *vdev);
>> +    int (*write)(struct vhost_dev *vdev, u64 vhost_dst, void *src,
>> int len);
>> +    int (*read)(struct vhost_dev *vdev, void *dst, u64 vhost_src, int
>> len);
>> +    int (*set_features)(struct vhost_dev *vdev, u64 device_features);
>> +    int (*set_status)(struct vhost_dev *vdev, u8 status);
>> +    u8 (*get_status)(struct vhost_dev *vdev);
>> +};
>> +
>> struct virtio_config_ops
>> I think there's some overlap here and some of the ops tries to do the
>> same thing.
>>
>> I think it differs in (*set_vq_address)() and (*create_vqs)().
>> [create_vqs() introduced in struct vhost_config_ops provides
>> complimentary functionality to (*find_vqs)() in struct
>> virtio_config_ops. It seemingly encapsulates the functionality of
>> (*set_vq_address)(), (*set_vq_num)(), (*set_vq_cb)(),..].
>>
>> Back to the difference between (*set_vq_address)() and (*create_vqs)(),
>> set_vq_address() directly provides the virtqueue address to the vdpa
>> device but create_vqs() only provides the parameters of the virtqueue
>> (like the number of virtqueues, number of buffers) but does not directly
>> provide the address. IMO the backend client drivers (like net or vhost)
>> shouldn't/cannot by itself know how to access the vring created on
>> virtio front-end. The vdpa device/vhost device should have logic for
>> that. That will help the client drivers to work with different types of
>> vdpa device/vhost device and can access the vring created by virtio
>> irrespective of whether the vring can be accessed via mmio or kernel
>> space or user space.
>>
>> I think vdpa always works with client drivers in userspace and providing
>> userspace address for vring.
> 
> 
> Sorry for being unclear. What I meant is not replacing vDPA with the
> vhost(bus) you proposed but the possibility of replacing virtio-pci-epf
> with vDPA in:

Okay, so the virtio back-end still use vhost and front end should use
vDPA. I see. So the host side PCI driver for EPF should populate
vdpa_config_ops and invoke vdpa_register_device().
> 
> My question is basically for the part of virtio_pci_epf_send_command(),
> so it looks to me you have a vendor specific API to replace the
> virtio-pci layout of the BAR:

Even when we use vDPA, we have to use some sort of
virtio_pci_epf_send_command() to communicate with virtio backend right?

Right, the layout is slightly different from the standard layout.

This is the layout
struct epf_vhost_reg_queue {
        u8 cmd;
        u8 cmd_status;
        u16 status;
        u16 num_buffers;
        u16 msix_vector;
        u64 queue_addr;
} __packed;

struct epf_vhost_reg {
        u64 host_features;
        u64 guest_features;
        u16 msix_config;
        u16 num_queues;
        u8 device_status;
        u8 config_generation;
        u32 isr;
        u8 cmd;
        u8 cmd_status;
        struct epf_vhost_reg_queue vq[MAX_VQS];
} __packed;
> 
> 
> +static int virtio_pci_epf_send_command(struct virtio_pci_device *vp_dev,
> +                       u32 command)
> +{
> +    struct virtio_pci_epf *pci_epf;
> +    void __iomem *ioaddr;
> +    ktime_t timeout;
> +    bool timedout;
> +    int ret = 0;
> +    u8 status;
> +
> +    pci_epf = to_virtio_pci_epf(vp_dev);
> +    ioaddr = vp_dev->ioaddr;
> +
> +    mutex_lock(&pci_epf->lock);
> +    writeb(command, ioaddr + HOST_CMD);
> +    timeout = ktime_add_ms(ktime_get(), COMMAND_TIMEOUT);
> +    while (1) {
> +        timedout = ktime_after(ktime_get(), timeout);
> +        status = readb(ioaddr + HOST_CMD_STATUS);
> +
> 
> Several questions:
> 
> - It's not clear to me how the synchronization is done between the RC
> and EP. E.g how and when the value of HOST_CMD_STATUS can be changed.

The HOST_CMD (commands sent to the EP) is serialized by using mutex.
Once the EP reads the command, it resets the value in HOST_CMD. So
HOST_CMD is less likely an issue.

A sufficiently large time is given for the EP to complete it's operation
(1 Sec) where the EP provides the status in HOST_CMD_STATUS. After it
expires, HOST_CMD_STATUS_NONE is written to HOST_CMD_STATUS. There could
be case where EP updates HOST_CMD_STATUS after RC writes
HOST_CMD_STATUS_NONE, but by then HOST has already detected this as
failure and error-ed out.
 
> If you still want to introduce a new transport, a virtio spec patch
> would be helpful for us to understand the device API.

Okay, that should be on https://github.com/oasis-tcs/virtio-spec.git?
> - You have you vendor specific layout (according to
> virtio_pci_epb_table()), so I guess you it's better to have a vendor
> specific vDPA driver instead

Okay, with vDPA, we are free to define our own layouts.
> - The advantage of vendor specific vDPA driver is that it can 1) have
> less codes 2) support userspace drivers through vhost-vDPA (instead of
> inventing new APIs since we can't use vfio-pci here).

I see there's an additional level of indirection from virtio to vDPA and
probably no need for spec update but don't exactly see how it'll reduce
code.

For 2, Isn't vhost-vdpa supposed to run on virtio backend?

>From a high level, I think I should be able to use vDPA for
virtio_pci_epf.c. Would you also suggest using vDPA for ntb_virtio.c?
([RFC PATCH 20/22] NTB: Add a new NTB client driver to implement VIRTIO
functionality).

Thanks
Kishon