[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200828123409.4cd2a812.cohuck@redhat.com>
Date: Fri, 28 Aug 2020 12:34:09 +0200
From: Cornelia Huck <cohuck@...hat.com>
To: Jason Wang <jasowang@...hat.com>
Cc: Kishon Vijay Abraham I <kishon@...com>,
"Michael S. Tsirkin" <mst@...hat.com>,
Ohad Ben-Cohen <ohad@...ery.com>,
Bjorn Andersson <bjorn.andersson@...aro.org>,
Jon Mason <jdmason@...zu.us>,
Dave Jiang <dave.jiang@...el.com>,
Allen Hubbe <allenbh@...il.com>,
Lorenzo Pieralisi <lorenzo.pieralisi@....com>,
Bjorn Helgaas <bhelgaas@...gle.com>,
Paolo Bonzini <pbonzini@...hat.com>,
Stefan Hajnoczi <stefanha@...hat.com>,
Stefano Garzarella <sgarzare@...hat.com>,
linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-remoteproc@...r.kernel.org, linux-ntb@...glegroups.com,
linux-pci@...r.kernel.org, kvm@...r.kernel.org,
virtualization@...ts.linux-foundation.org, netdev@...r.kernel.org
Subject: Re: [RFC PATCH 00/22] Enhance VHOST to enable SoC-to-SoC
communication
On Thu, 9 Jul 2020 14:26:53 +0800
Jason Wang <jasowang@...hat.com> wrote:
[Let me note right at the beginning that I first noted this while
listening to Kishon's talk at LPC on Wednesday. I might be very
confused about the background here, so let me apologize beforehand for
any confusion I might spread.]
> On 2020/7/8 下午9:13, Kishon Vijay Abraham I wrote:
> > Hi Jason,
> >
> > On 7/8/2020 4:52 PM, Jason Wang wrote:
> >> On 2020/7/7 下午10:45, Kishon Vijay Abraham I wrote:
> >>> Hi Jason,
> >>>
> >>> On 7/7/2020 3:17 PM, Jason Wang wrote:
> >>>> On 2020/7/6 下午5:32, Kishon Vijay Abraham I wrote:
> >>>>> Hi Jason,
> >>>>>
> >>>>> On 7/3/2020 12:46 PM, Jason Wang wrote:
> >>>>>> On 2020/7/2 下午9:35, Kishon Vijay Abraham I wrote:
> >>>>>>> Hi Jason,
> >>>>>>>
> >>>>>>> On 7/2/2020 3:40 PM, Jason Wang wrote:
> >>>>>>>> On 2020/7/2 下午5:51, Michael S. Tsirkin wrote:
> >>>>>>>>> On Thu, Jul 02, 2020 at 01:51:21PM +0530, Kishon Vijay Abraham I wrote:
> >>>>>>>>>> This series enhances Linux Vhost support to enable SoC-to-SoC
> >>>>>>>>>> communication over MMIO. This series enables rpmsg communication between
> >>>>>>>>>> two SoCs using both PCIe RC<->EP and HOST1-NTB-HOST2
> >>>>>>>>>>
> >>>>>>>>>> 1) Modify vhost to use standard Linux driver model
> >>>>>>>>>> 2) Add support in vring to access virtqueue over MMIO
> >>>>>>>>>> 3) Add vhost client driver for rpmsg
> >>>>>>>>>> 4) Add PCIe RC driver (uses virtio) and PCIe EP driver (uses vhost) for
> >>>>>>>>>> rpmsg communication between two SoCs connected to each other
> >>>>>>>>>> 5) Add NTB Virtio driver and NTB Vhost driver for rpmsg communication
> >>>>>>>>>> between two SoCs connected via NTB
> >>>>>>>>>> 6) Add configfs to configure the components
> >>>>>>>>>>
> >>>>>>>>>> UseCase1 :
> >>>>>>>>>>
> >>>>>>>>>> VHOST RPMSG VIRTIO RPMSG
> >>>>>>>>>> + +
> >>>>>>>>>> | |
> >>>>>>>>>> | |
> >>>>>>>>>> | |
> >>>>>>>>>> | |
> >>>>>>>>>> +-----v------+ +------v-------+
> >>>>>>>>>> | Linux | | Linux |
> >>>>>>>>>> | Endpoint | | Root Complex |
> >>>>>>>>>> | <-----------------> |
> >>>>>>>>>> | | | |
> >>>>>>>>>> | SOC1 | | SOC2 |
> >>>>>>>>>> +------------+ +--------------+
> >>>>>>>>>>
> >>>>>>>>>> UseCase 2:
> >>>>>>>>>>
> >>>>>>>>>> VHOST RPMSG VIRTIO RPMSG
> >>>>>>>>>> + +
> >>>>>>>>>> | |
> >>>>>>>>>> | |
> >>>>>>>>>> | |
> >>>>>>>>>> | |
> >>>>>>>>>> +------v------+ +------v------+
> >>>>>>>>>> | | | |
> >>>>>>>>>> | HOST1 | | HOST2 |
> >>>>>>>>>> | | | |
> >>>>>>>>>> +------^------+ +------^------+
> >>>>>>>>>> | |
> >>>>>>>>>> | |
> >>>>>>>>>> +---------------------------------------------------------------------+
> >>>>>>>>>> | +------v------+ +------v------+ |
> >>>>>>>>>> | | | | | |
> >>>>>>>>>> | | EP | | EP | |
> >>>>>>>>>> | | CONTROLLER1 | | CONTROLLER2 | |
> >>>>>>>>>> | | <-----------------------------------> | |
> >>>>>>>>>> | | | | | |
> >>>>>>>>>> | | | | | |
> >>>>>>>>>> | | | SoC With Multiple EP Instances | | |
> >>>>>>>>>> | | | (Configured using NTB Function) | | |
> >>>>>>>>>> | +-------------+ +-------------+ |
> >>>>>>>>>> +---------------------------------------------------------------------+
First of all, to clarify the terminology:
Is "vhost rpmsg" acting as what the virtio standard calls the 'device',
and "virtio rpmsg" as the 'driver'? Or is the "vhost" part mostly just
virtqueues + the exiting vhost interfaces?
> >>>>>>>>>>
> >>>>>>>>>> Software Layering:
> >>>>>>>>>>
> >>>>>>>>>> The high-level SW layering should look something like below. This series
> >>>>>>>>>> adds support only for RPMSG VHOST, however something similar should be
> >>>>>>>>>> done for net and scsi. With that any vhost device (PCI, NTB, Platform
> >>>>>>>>>> device, user) can use any of the vhost client driver.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> +----------------+ +-----------+ +------------+ +----------+
> >>>>>>>>>> | RPMSG VHOST | | NET VHOST | | SCSI VHOST | | X |
> >>>>>>>>>> +-------^--------+ +-----^-----+ +-----^------+ +----^-----+
> >>>>>>>>>> | | | |
> >>>>>>>>>> | | | |
> >>>>>>>>>> | | | |
> >>>>>>>>>> +-----------v-----------------v--------------v--------------v----------+
> >>>>>>>>>> | VHOST CORE |
> >>>>>>>>>> +--------^---------------^--------------------^------------------^-----+
> >>>>>>>>>> | | | |
> >>>>>>>>>> | | | |
> >>>>>>>>>> | | | |
> >>>>>>>>>> +--------v-------+ +----v------+ +----------v----------+ +----v-----+
> >>>>>>>>>> | PCI EPF VHOST | | NTB VHOST | |PLATFORM DEVICE VHOST| | X |
> >>>>>>>>>> +----------------+ +-----------+ +---------------------+ +----------+
So, the upper half is basically various functionality types, e.g. a net
device. What is the lower half, a hardware interface? Would it be
equivalent to e.g. a normal PCI device?
> >>>>>>>>>>
> >>>>>>>>>> This was initially proposed here [1]
> >>>>>>>>>>
> >>>>>>>>>> [1] ->
> >>>>>>>>>> https://lore.kernel.org/r/2cf00ec4-1ed6-f66e-6897-006d1a5b6390@ti.com
> >>>>>>>>> I find this very interesting. A huge patchset so will take a bit
> >>>>>>>>> to review, but I certainly plan to do that. Thanks!
> >>>>>>>> Yes, it would be better if there's a git branch for us to have a look.
> >>>>>>> I've pushed the branch
> >>>>>>> https://github.com/kishon/linux-wip.git vhost_rpmsg_pci_ntb_rfc
> >>>>>> Thanks
> >>>>>>
> >>>>>>
> >>>>>>>> Btw, I'm not sure I get the big picture, but I vaguely feel some of the
> >>>>>>>> work is
> >>>>>>>> duplicated with vDPA (e.g the epf transport or vhost bus).
> >>>>>>> This is about connecting two different HW systems both running Linux and
> >>>>>>> doesn't necessarily involve virtualization.
> >>>>>> Right, this is something similar to VOP
> >>>>>> (Documentation/misc-devices/mic/mic_overview.rst). The different is the
> >>>>>> hardware I guess and VOP use userspace application to implement the device.
> >>>>> I'd also like to point out, this series tries to have communication between
> >>>>> two
> >>>>> SoCs in vendor agnostic way. Since this series solves for 2 usecases (PCIe
> >>>>> RC<->EP and NTB), for the NTB case it directly plugs into NTB framework and
> >>>>> any
> >>>>> of the HW in NTB below should be able to use a virtio-vhost communication
> >>>>>
> >>>>> #ls drivers/ntb/hw/
> >>>>> amd epf idt intel mscc
> >>>>>
> >>>>> And similarly for the PCIe RC<->EP communication, this adds a generic endpoint
> >>>>> function driver and hence any SoC that supports configurable PCIe endpoint can
> >>>>> use virtio-vhost communication
> >>>>>
> >>>>> # ls drivers/pci/controller/dwc/*ep*
> >>>>> drivers/pci/controller/dwc/pcie-designware-ep.c
> >>>>> drivers/pci/controller/dwc/pcie-uniphier-ep.c
> >>>>> drivers/pci/controller/dwc/pci-layerscape-ep.c
> >>>> Thanks for those backgrounds.
> >>>>
> >>>>
> >>>>>>> So there is no guest or host as in
> >>>>>>> virtualization but two entirely different systems connected via PCIe cable,
> >>>>>>> one
> >>>>>>> acting as guest and one as host. So one system will provide virtio
> >>>>>>> functionality reserving memory for virtqueues and the other provides vhost
> >>>>>>> functionality providing a way to access the virtqueues in virtio memory.
> >>>>>>> One is
> >>>>>>> source and the other is sink and there is no intermediate entity. (vhost was
> >>>>>>> probably intermediate entity in virtualization?)
> >>>>>> (Not a native English speaker) but "vhost" could introduce some confusion for
> >>>>>> me since it was use for implementing virtio backend for userspace drivers. I
> >>>>>> guess "vringh" could be better.
> >>>>> Initially I had named this vringh but later decided to choose vhost instead of
> >>>>> vringh. vhost is still a virtio backend (not necessarily userspace) though it
> >>>>> now resides in an entirely different system. Whatever virtio is for a frontend
> >>>>> system, vhost can be that for a backend system. vring can be for accessing
> >>>>> virtqueue and can be used either in frontend or backend.
I guess that clears up at least some of my questions from above...
> >>>> Ok.
> >>>>
> >>>>
> >>>>>>>> Have you considered to implement these through vDPA?
> >>>>>>> IIUC vDPA only provides an interface to userspace and an in-kernel rpmsg
> >>>>>>> driver
> >>>>>>> or vhost net driver is not provided.
> >>>>>>>
> >>>>>>> The HW connection looks something like https://pasteboard.co/JfMVVHC.jpg
> >>>>>>> (usecase2 above),
> >>>>>> I see.
> >>>>>>
> >>>>>>
> >>>>>>> all the boards run Linux. The middle board provides NTB
> >>>>>>> functionality and board on either side provides virtio/vhost
> >>>>>>> functionality and
> >>>>>>> transfer data using rpmsg.
This setup looks really interesting (sometimes, it's really hard to
imagine this in the abstract.)
> >>>>>> So I wonder whether it's worthwhile for a new bus. Can we use
> >>>>>> the existed virtio-bus/drivers? It might work as, except for
> >>>>>> the epf transport, we can introduce a epf "vhost" transport
> >>>>>> driver.
> >>>>> IMHO we'll need two buses one for frontend and other for
> >>>>> backend because the two components can then co-operate/interact
> >>>>> with each other to provide a functionality. Though both will
> >>>>> seemingly provide similar callbacks, they are both provide
> >>>>> symmetrical or complimentary funcitonality and need not be same
> >>>>> or identical.
> >>>>>
> >>>>> Having the same bus can also create sequencing issues.
> >>>>>
> >>>>> If you look at virtio_dev_probe() of virtio_bus
> >>>>>
> >>>>> device_features = dev->config->get_features(dev);
> >>>>>
> >>>>> Now if we use same bus for both front-end and back-end, both
> >>>>> will try to get_features when there has been no set_features.
> >>>>> Ideally vhost device should be initialized first with the set
> >>>>> of features it supports. Vhost and virtio should use "status"
> >>>>> and "features" complimentarily and not identically.
> >>>> Yes, but there's no need for doing status/features passthrough
> >>>> in epf vhost drivers.b
> >>>>
> >>>>
> >>>>> virtio device (or frontend) cannot be initialized before vhost
> >>>>> device (or backend) gets initialized with data such as
> >>>>> features. Similarly vhost (backend)
> >>>>> cannot access virqueues or buffers before virtio (frontend) sets
> >>>>> VIRTIO_CONFIG_S_DRIVER_OK whereas that requirement is not there
> >>>>> for virtio as the physical memory for virtqueues are created by
> >>>>> virtio (frontend).
> >>>> epf vhost drivers need to implement two devices: vhost(vringh)
> >>>> device and virtio device (which is a mediated device). The
> >>>> vhost(vringh) device is doing feature negotiation with the
> >>>> virtio device via RC/EP or NTB. The virtio device is doing
> >>>> feature negotiation with local virtio drivers. If there're
> >>>> feature mismatch, epf vhost drivers and do mediation between
> >>>> them.
> >>> Here epf vhost should be initialized with a set of features for
> >>> it to negotiate either as vhost device or virtio device no? Where
> >>> should the initial feature set for epf vhost come from?
> >>
> >> I think it can work as:
> >>
> >> 1) Having an initial features (hard coded in the code) set X in
> >> epf vhost 2) Using this X for both virtio device and vhost(vringh)
> >> device 3) local virtio driver will negotiate with virtio device
> >> with feature set Y 4) remote virtio driver will negotiate with
> >> vringh device with feature set Z 5) mediate between feature Y and
> >> feature Z since both Y and Z are a subset of X
> >>
> >>
> > okay. I'm also thinking if we could have configfs for configuring
> > this. Anyways we could find different approaches of configuring
> > this.
>
>
> Yes, and I think some management API is needed even in the design of
> your "Software Layering". In that figure, rpmsg vhost need some
> pre-set or hard-coded features.
When I saw the plumbers talk, my first idea was "this needs to be a new
transport". You have some hard-coded or pre-configured features, and
then features are negotiated via a transport-specific means in the
usual way. There's basically an extra/extended layer for this (and
status, and whatever).
Does that make any sense?
>
>
> >>>>>> It will have virtqueues but only used for the communication
> >>>>>> between itself and
> >>>>>> uppter virtio driver. And it will have vringh queues which
> >>>>>> will be probe by virtio epf transport drivers. And it needs to
> >>>>>> do datacopy between virtqueue and
> >>>>>> vringh queues.
> >>>>>>
> >>>>>> It works like:
> >>>>>>
> >>>>>> virtio drivers <- virtqueue/virtio-bus -> epf vhost drivers <-
> >>>>>> vringh queue/epf>
> >>>>>>
> >>>>>> The advantages is that there's no need for writing new buses
> >>>>>> and drivers.
> >>>>> I think this will work however there is an addtional copy
> >>>>> between vringh queue and virtqueue,
> >>>> I think not? E.g in use case 1), if we stick to virtio bus, we
> >>>> will have:
> >>>>
> >>>> virtio-rpmsg (EP) <- virtio ring(1) -> epf vhost driver (EP) <-
> >>>> virtio ring(2) -> virtio pci (RC) <-> virtio rpmsg (RC)
> >>> IIUC epf vhost driver (EP) will access virtio ring(2) using
> >>> vringh?
> >>
> >> Yes.
> >>
> >>
> >>> And virtio
> >>> ring(2) is created by virtio pci (RC).
> >>
> >> Yes.
> >>
> >>
> >>>> What epf vhost driver did is to read from virtio ring(1) about
> >>>> the buffer len and addr and them DMA to Linux(RC)?
> >>> okay, I made some optimization here where vhost-rpmsg using a
> >>> helper writes a buffer from rpmsg's upper layer directly to
> >>> remote Linux (RC) as against here were it has to be first written
> >>> to virtio ring (1).
> >>>
> >>> Thinking how this would look for NTB
> >>> virtio-rpmsg (HOST1) <- virtio ring(1) -> NTB(HOST1) <->
> >>> NTB(HOST2) <- virtio ring(2) -> virtio-rpmsg (HOST2)
> >>>
> >>> Here the NTB(HOST1) will access the virtio ring(2) using vringh?
> >>
> >> Yes, I think so it needs to use vring to access virtio ring (1) as
> >> well.
> > NTB(HOST1) and virtio ring(1) will be in the same system. So it
> > doesn't have to use vring. virtio ring(1) is by the virtio device
> > the NTB(HOST1) creates.
>
>
> Right.
>
>
> >>
> >>> Do you also think this will work seamlessly with virtio_net.c,
> >>> virtio_blk.c?
> >>
> >> Yes.
> > okay, I haven't looked at this but the backend of virtio_blk should
> > access an actual storage device no?
>
>
> Good point, for non-peer device like storage. There's probably no
> need for it to be registered on the virtio bus and it might be better
> to behave as you proposed.
I might be missing something; but if you expose something as a block
device, it should have something it can access with block reads/writes,
shouldn't it? Of course, that can be a variety of things.
>
> Just to make sure I understand the design, how is VHOST SCSI expected
> to work in your proposal, does it have a device for file as a backend?
>
>
> >>
> >>> I'd like to get clarity on two things in the approach you
> >>> suggested, one is features (since epf vhost should ideally be
> >>> transparent to any virtio driver)
> >>
> >> We can have have an array of pre-defined features indexed by
> >> virtio device id in the code.
> >>
> >>
> >>> and the other is how certain inputs to virtio device such as
> >>> number of buffers be determined.
> >>
> >> We can start from hard coded the value like 256, or introduce some
> >> API for user to change the value.
> >>
> >>
> >>> Thanks again for your suggestions!
> >>
> >> You're welcome.
> >>
> >> Note that I just want to check whether or not we can reuse the
> >> virtio bus/driver. It's something similar to what you proposed in
> >> Software Layering but we just replace "vhost core" with "virtio
> >> bus" and move the vhost core below epf/ntb/platform transport.
> > Got it. My initial design was based on my understanding of your
> > comments [1].
>
>
> Yes, but that's just for a networking device. If we want something
> more generic, it may require more thought (bus etc).
I believe that we indeed need something bus-like to be able to support
a variety of devices.
>
>
> >
> > I'll try to create something based on your proposed design here.
>
>
> Sure, but for coding, we'd better wait for other's opinion here.
Please tell me if my thoughts above make any sense... I have just
started looking at that, so I might be completely off.
Powered by blists - more mailing lists