[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0ccf9cc6-4916-7815-9ce2-990dc7884849@linux.ibm.com>
Date: Wed, 3 Aug 2022 16:27:54 -0400
From: Matthew Rosato <mjrosato@...ux.ibm.com>
To: Tony Lu <tonylu@...ux.alibaba.com>, kgraul@...ux.ibm.com,
wenjia@...ux.ibm.com, davem@...emloft.net, edumazet@...gle.com,
kuba@...nel.org, pabeni@...hat.com
Cc: netdev@...r.kernel.org, linux-s390@...r.kernel.org,
zmlcc@...ux.alibaba.com, hans@...ux.alibaba.com,
zhiyuan2048@...ux.alibaba.com, herongguang@...ux.alibaba.com
Subject: Re: [RFC net-next 1/1] net/smc: SMC for inter-VM communication
On 7/20/22 1:00 PM, Tony Lu wrote:
> Hi all,
>
> # Background
>
> We (Alibaba Cloud) have already used SMC in cloud environment to
> transparently accelerate TCP applications with ERDMA [1]. Nowadays,
> there is a common scenario that deploy containers (which runtime is
> based on lightweight virtual machine) on ECS (Elastic Compute Service),
> and the containers may want to be scheduled on the same host in order to
> get higher performance of network, such as AI, big data or other
> scenarios that are sensitive with bandwidth and latency. Currently, the
> performance of inter-VM is poor and CPU resource is wasted (see
> #Benchmark virtio). This scenario has been discussed many times, but a
> solution for a common scenario for applications is missing [2] [3] [4].
>
> # Design
>
> In inter-VM scenario, we use ivshmem (Inter-VM shared memory device)
> which is modeled by QEMU [5]. With it, multiple VMs can access one
> shared memory. This shared memory device is statically created by host
> and shared to desired guests. The device exposes as a PCI BAR, and can
> interrupt its peers (ivshmem-doorbell).
>
> In order to use ivshmem in SMC, we write a draft device driver as a
> bridge between SMC and ivshmem PCI device. To make it easier, this
> driver acts like a SMC-D device in order to fit in SMC without modifying
> the code, which is named ivpci (see patch #1).
>
> ┌───────────────────────────────────────┐
> │ ┌───────────────┐ ┌───────────────┐ │
> │ │ VM1 │ │ VM2 │ │
> │ │┌─────────────┐│ │┌─────────────┐│ │
> │ ││ Application ││ ││ Application ││ │
> │ │├─────────────┤│ │├─────────────┤│ │
> │ ││ SMC ││ ││ SMC ││ │
> │ │├─────────────┤│ │├─────────────┤│ │
> │ ││ ivpci ││ ││ ivpci ││ │
> │ └└─────────────┘┘ └└─────────────┘┘ │
> │ x * x * │
> │ x ****************x* * │
> │ x xxxxxxxxxxxxxxxxx* * │
> │ x x * * │
> │ ┌───────────────┐ ┌───────────────┐ │
> │ │shared memories│ │ivshmem-server │ │
> │ └───────────────┘ └───────────────┘ │
> │ HOST A │
> └───────────────────────────────────────┘
> *********** Control flow (interrupt)
> xxxxxxxxxxx Data flow (memory access)
>
> Inside ivpci driver, it implements almost all the operations of SMC-D
> device. It can be divided into two parts:
>
> - control flow, most of it is same with SMC-D, use ivshmem trigger
> interruptions in ivpci and process CDC flow.
>
> - data flow, the shared memory of each connection is one large region
> and divided into two part for local and remote RMB. Every writer
> syscall copies data to sndbuf and calls ISM's move_data() to move data
> to remote RMB in ivshmem and interrupt remote. And reader then
> receives interruption and check CDC message, consume data if cursor is
> updated.
>
> # Benchmark
>
> Current POC of ivpci is unstable and only works for single SMC
> connection. Here is the brief data:
>
> Items Latency (pingpong) Throughput (64KB)
> TCP (virtio) 19.3 us 3794.185 MBps
> TCP (SR-IOV) 13.2 us 3948.792 MBps
> SMC (ivshmem) 6.3 us 11900.269 MBps
>
> Test environments:
>
> - CPU Intel Xeon Platinum 8 core, mem 32 GiB
> - NIC Mellanox CX4 with 2 VFs in two different guests
> - using virsh to setup virtio-net + vhost
> - using sockperf and single connection
> - SMC + ivshmem throughput uses one-copy (userspace -> kernel copy)
> with intrusive modification of SMC (see patch #1), latency (pingpong)
> use two-copy (user -> kernel and move_data() copy, patch version).
>
> With the comparison, SMC with ivshmem gets 3-4x bandwidth and a half
> latency.
>
> TCP + virtio is the most usage solution for guest, it gains lower
> performance. Moreover, it consumes extra thread with full CPU core
> occupied in host to transfer data, wastes more CPU resource. If the host
> is very busy, the performance will be worse.
>
Hi Tony,
Quite interesting! FWIW for s390x we are also looking at passthrough of
host ISM devices to enable SMC-D in QEMU guests:
https://lore.kernel.org/kvm/20220606203325.110625-1-mjrosato@linux.ibm.com/
https://lore.kernel.org/kvm/20220606203614.110928-1-mjrosato@linux.ibm.com/
But seems to me an 'emulated ISM' of sorts could still be interesting
even on s390x e.g. for scenarios where host device passthrough is not
possible/desired.
Out of curiosity I tried this ivpci module on s390x but the device won't
probe -- This is possibly an issue with the s390x PCI emulation layer in
QEMU, I'll have to look into that.
> # Discussion
>
> This RFC and solution is still in early stage, so we want to come it up
> as soon as possible and fully discuss with IBM and community. We have
> some topics putting on the table:
>
> 1. SMC officially supports this scenario.
>
> SMC + ivshmem shows huge improvement when communicating inter VMs. SMC-D
> and mocking ISM device might not be the official solution, maybe another
> extension for SMC besides SMC-R and SMC-D. So we are wondering if SMC
> would accept this idea to fix this scenario? Are there any other
> possibilities?
I am curious about ivshmem and its current state though -- e.g. looking
around I see mention of v2 which you also referenced but don't see any
activity on it for a few years? And as far as v1 ivshmem -- server "not
for production use", etc.
Thanks,
Matt
>
> 2. Implementation of SMC for inter-VM.
>
> SMC is used in container and cloud environment, maybe we can propose a
> new device and new protocol if possible in these new scenarios to solve
> this problem.
>
> 3. Standardize this new protocol and device.
>
> SMC-R has an open RFC 7609, so can this new device or protocol like
> SMC-D can be standardized. There is a possible option that proposing a
> new device model in QEMU + virtio ecosystem and SMC supports this
> standard virtio device, like [6].
>
> If there are any problems, please point them out.
>
> Hope to hear from you, thank you.
>
> [1] https://lwn.net/Articles/879373/
> [2] https://projectacrn.github.io/latest/tutorials/enable_ivshmem.html
> [3] https://dl.acm.org/doi/10.1145/2847562
> [4] https://hal.archives-ouvertes.fr/hal-00368622/document
> [5] https://github.com/qemu/qemu/blob/master/docs/specs/ivshmem-spec.txt
> [6] https://github.com/siemens/jailhouse/blob/master/Documentation/ivshmem-v2-specification.md
>
> Signed-off-by: Tony Lu <tonylu@...ux.alibaba.com>
Powered by blists - more mailing lists