linux-kernel - Re: [RFC PATCH 00/22] Enhance VHOST to enable SoC-to-SoC communication

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ee0aa81d-064b-d7a7-86bb-79a3f4d3dd11@redhat.com>
Date:   Wed, 16 Sep 2020 11:10:50 +0800
From:   Jason Wang <jasowang@...hat.com>
To:     Kishon Vijay Abraham I <kishon@...com>,
        Cornelia Huck <cohuck@...hat.com>
Cc:     "Michael S. Tsirkin" <mst@...hat.com>,
        Ohad Ben-Cohen <ohad@...ery.com>,
        Bjorn Andersson <bjorn.andersson@...aro.org>,
        Jon Mason <jdmason@...zu.us>,
        Dave Jiang <dave.jiang@...el.com>,
        Allen Hubbe <allenbh@...il.com>,
        Lorenzo Pieralisi <lorenzo.pieralisi@....com>,
        Bjorn Helgaas <bhelgaas@...gle.com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Stefan Hajnoczi <stefanha@...hat.com>,
        Stefano Garzarella <sgarzare@...hat.com>,
        linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-remoteproc@...r.kernel.org, linux-ntb@...glegroups.com,
        linux-pci@...r.kernel.org, kvm@...r.kernel.org,
        virtualization@...ts.linux-foundation.org, netdev@...r.kernel.org
Subject: Re: [RFC PATCH 00/22] Enhance VHOST to enable SoC-to-SoC
 communication


On 2020/9/15 下午11:47, Kishon Vijay Abraham I wrote:
> Hi Jason,
>
> On 15/09/20 1:48 pm, Jason Wang wrote:
>> Hi Kishon:
>>
>> On 2020/9/14 下午3:23, Kishon Vijay Abraham I wrote:
>>>> Then you need something that is functional equivalent to virtio PCI
>>>> which is actually the concept of vDPA (e.g vDPA provides alternatives if
>>>> the queue_sel is hard in the EP implementation).
>>> Okay, I just tried to compare the 'struct vdpa_config_ops' and 'struct
>>> vhost_config_ops' ( introduced in [RFC PATCH 03/22] vhost: Add ops for
>>> the VHOST driver to configure VHOST device).
>>>
>>> struct vdpa_config_ops {
>>>      /* Virtqueue ops */
>>>      int (*set_vq_address)(struct vdpa_device *vdev,
>>>                    u16 idx, u64 desc_area, u64 driver_area,
>>>                    u64 device_area);
>>>      void (*set_vq_num)(struct vdpa_device *vdev, u16 idx, u32 num);
>>>      void (*kick_vq)(struct vdpa_device *vdev, u16 idx);
>>>      void (*set_vq_cb)(struct vdpa_device *vdev, u16 idx,
>>>                struct vdpa_callback *cb);
>>>      void (*set_vq_ready)(struct vdpa_device *vdev, u16 idx, bool ready);
>>>      bool (*get_vq_ready)(struct vdpa_device *vdev, u16 idx);
>>>      int (*set_vq_state)(struct vdpa_device *vdev, u16 idx,
>>>                  const struct vdpa_vq_state *state);
>>>      int (*get_vq_state)(struct vdpa_device *vdev, u16 idx,
>>>                  struct vdpa_vq_state *state);
>>>      struct vdpa_notification_area
>>>      (*get_vq_notification)(struct vdpa_device *vdev, u16 idx);
>>>      /* vq irq is not expected to be changed once DRIVER_OK is set */
>>>      int (*get_vq_irq)(struct vdpa_device *vdv, u16 idx);
>>>
>>>      /* Device ops */
>>>      u32 (*get_vq_align)(struct vdpa_device *vdev);
>>>      u64 (*get_features)(struct vdpa_device *vdev);
>>>      int (*set_features)(struct vdpa_device *vdev, u64 features);
>>>      void (*set_config_cb)(struct vdpa_device *vdev,
>>>                    struct vdpa_callback *cb);
>>>      u16 (*get_vq_num_max)(struct vdpa_device *vdev);
>>>      u32 (*get_device_id)(struct vdpa_device *vdev);
>>>      u32 (*get_vendor_id)(struct vdpa_device *vdev);
>>>      u8 (*get_status)(struct vdpa_device *vdev);
>>>      void (*set_status)(struct vdpa_device *vdev, u8 status);
>>>      void (*get_config)(struct vdpa_device *vdev, unsigned int offset,
>>>                 void *buf, unsigned int len);
>>>      void (*set_config)(struct vdpa_device *vdev, unsigned int offset,
>>>                 const void *buf, unsigned int len);
>>>      u32 (*get_generation)(struct vdpa_device *vdev);
>>>
>>>      /* DMA ops */
>>>      int (*set_map)(struct vdpa_device *vdev, struct vhost_iotlb *iotlb);
>>>      int (*dma_map)(struct vdpa_device *vdev, u64 iova, u64 size,
>>>                 u64 pa, u32 perm);
>>>      int (*dma_unmap)(struct vdpa_device *vdev, u64 iova, u64 size);
>>>
>>>      /* Free device resources */
>>>      void (*free)(struct vdpa_device *vdev);
>>> };
>>>
>>> +struct vhost_config_ops {
>>> +    int (*create_vqs)(struct vhost_dev *vdev, unsigned int nvqs,
>>> +              unsigned int num_bufs, struct vhost_virtqueue *vqs[],
>>> +              vhost_vq_callback_t *callbacks[],
>>> +              const char * const names[]);
>>> +    void (*del_vqs)(struct vhost_dev *vdev);
>>> +    int (*write)(struct vhost_dev *vdev, u64 vhost_dst, void *src,
>>> int len);
>>> +    int (*read)(struct vhost_dev *vdev, void *dst, u64 vhost_src, int
>>> len);
>>> +    int (*set_features)(struct vhost_dev *vdev, u64 device_features);
>>> +    int (*set_status)(struct vhost_dev *vdev, u8 status);
>>> +    u8 (*get_status)(struct vhost_dev *vdev);
>>> +};
>>> +
>>> struct virtio_config_ops
>>> I think there's some overlap here and some of the ops tries to do the
>>> same thing.
>>>
>>> I think it differs in (*set_vq_address)() and (*create_vqs)().
>>> [create_vqs() introduced in struct vhost_config_ops provides
>>> complimentary functionality to (*find_vqs)() in struct
>>> virtio_config_ops. It seemingly encapsulates the functionality of
>>> (*set_vq_address)(), (*set_vq_num)(), (*set_vq_cb)(),..].
>>>
>>> Back to the difference between (*set_vq_address)() and (*create_vqs)(),
>>> set_vq_address() directly provides the virtqueue address to the vdpa
>>> device but create_vqs() only provides the parameters of the virtqueue
>>> (like the number of virtqueues, number of buffers) but does not directly
>>> provide the address. IMO the backend client drivers (like net or vhost)
>>> shouldn't/cannot by itself know how to access the vring created on
>>> virtio front-end. The vdpa device/vhost device should have logic for
>>> that. That will help the client drivers to work with different types of
>>> vdpa device/vhost device and can access the vring created by virtio
>>> irrespective of whether the vring can be accessed via mmio or kernel
>>> space or user space.
>>>
>>> I think vdpa always works with client drivers in userspace and providing
>>> userspace address for vring.
>>
>> Sorry for being unclear. What I meant is not replacing vDPA with the
>> vhost(bus) you proposed but the possibility of replacing virtio-pci-epf
>> with vDPA in:
> Okay, so the virtio back-end still use vhost and front end should use
> vDPA. I see. So the host side PCI driver for EPF should populate
> vdpa_config_ops and invoke vdpa_register_device().


Yes.


>> My question is basically for the part of virtio_pci_epf_send_command(),
>> so it looks to me you have a vendor specific API to replace the
>> virtio-pci layout of the BAR:
> Even when we use vDPA, we have to use some sort of
> virtio_pci_epf_send_command() to communicate with virtio backend right?


Right.


>
> Right, the layout is slightly different from the standard layout.
>
> This is the layout
> struct epf_vhost_reg_queue {
>          u8 cmd;
>          u8 cmd_status;
>          u16 status;
>          u16 num_buffers;
>          u16 msix_vector;
>          u64 queue_addr;


What's the meaning of queue_addr here?

Does not mean the device expects a contiguous memory for avail/desc/used 
ring?


> } __packed;
>
> struct epf_vhost_reg {
>          u64 host_features;
>          u64 guest_features;
>          u16 msix_config;
>          u16 num_queues;
>          u8 device_status;
>          u8 config_generation;
>          u32 isr;
>          u8 cmd;
>          u8 cmd_status;
>          struct epf_vhost_reg_queue vq[MAX_VQS];
> } __packed;
>>
>> +static int virtio_pci_epf_send_command(struct virtio_pci_device *vp_dev,
>> +                       u32 command)
>> +{
>> +    struct virtio_pci_epf *pci_epf;
>> +    void __iomem *ioaddr;
>> +    ktime_t timeout;
>> +    bool timedout;
>> +    int ret = 0;
>> +    u8 status;
>> +
>> +    pci_epf = to_virtio_pci_epf(vp_dev);
>> +    ioaddr = vp_dev->ioaddr;
>> +
>> +    mutex_lock(&pci_epf->lock);
>> +    writeb(command, ioaddr + HOST_CMD);
>> +    timeout = ktime_add_ms(ktime_get(), COMMAND_TIMEOUT);
>> +    while (1) {
>> +        timedout = ktime_after(ktime_get(), timeout);
>> +        status = readb(ioaddr + HOST_CMD_STATUS);
>> +
>>
>> Several questions:
>>
>> - It's not clear to me how the synchronization is done between the RC
>> and EP. E.g how and when the value of HOST_CMD_STATUS can be changed.
> The HOST_CMD (commands sent to the EP) is serialized by using mutex.
> Once the EP reads the command, it resets the value in HOST_CMD. So
> HOST_CMD is less likely an issue.


Here's my understanding of the protocol:

1) RC write to HOST_CMD
2) RC wait for HOST_CMD_STATUS to be HOST_CMD_STATUS_OKAY

It looks to me what EP should do is

1) EP reset HOST_CMD after reading new command

And it looks to me EP should also reset HOST_CMD_STATUS here?

(I thought there should be patch to handle stuffs like this but I didn't 
find it in this series)


>
> A sufficiently large time is given for the EP to complete it's operation
> (1 Sec) where the EP provides the status in HOST_CMD_STATUS. After it
> expires, HOST_CMD_STATUS_NONE is written to HOST_CMD_STATUS. There could
> be case where EP updates HOST_CMD_STATUS after RC writes
> HOST_CMD_STATUS_NONE, but by then HOST has already detected this as
> failure and error-ed out.
>   
>> If you still want to introduce a new transport, a virtio spec patch
>> would be helpful for us to understand the device API.
> Okay, that should be on https://github.com/oasis-tcs/virtio-spec.git?


Yes.


>> - You have you vendor specific layout (according to
>> virtio_pci_epb_table()), so I guess you it's better to have a vendor
>> specific vDPA driver instead
> Okay, with vDPA, we are free to define our own layouts.


Right, but vDPA have other requirements. E.g it requires the device have 
the ability to save/restore the state (e.g the last_avail_idx).

So it actually depends on what you want. If you don't care about 
userspace drivers and want to have a standard transport, you can still 
go virtio.


>> - The advantage of vendor specific vDPA driver is that it can 1) have
>> less codes 2) support userspace drivers through vhost-vDPA (instead of
>> inventing new APIs since we can't use vfio-pci here).
> I see there's an additional level of indirection from virtio to vDPA and
> probably no need for spec update but don't exactly see how it'll reduce
> code.


AFAIK you don't need to implement your own setup_vq and del_vq.


>
> For 2, Isn't vhost-vdpa supposed to run on virtio backend?


Not currently, vDPA is a superset of virtio (e.g it support virtqueue 
state save/restore). This it should be possible in the future probably.


>
>  From a high level, I think I should be able to use vDPA for
> virtio_pci_epf.c. Would you also suggest using vDPA for ntb_virtio.c?
> ([RFC PATCH 20/22] NTB: Add a new NTB client driver to implement VIRTIO
> functionality).


I think it's your call. If you want

1) a well-defined standard virtio transport
2) willing to finalize d and maintain the spec
3) doesn't care about userspace drivers

You can go with virtio, otherwise vDPA.

Thanks


>
> Thanks
> Kishon
>