lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <hloqtbnyooawma2fhfvtblgabiebonskfkoky2invqasikhg42@gwvmreq2ysy6>
Date: Thu, 23 May 2024 12:46:44 +0200
From: Stefano Garzarella <sgarzare@...hat.com>
To: Xuewei Niu <niuxuewei97@...il.com>
Cc: stefanha@...hat.com, mst@...hat.com, davem@...emloft.net, 
	kvm@...r.kernel.org, virtualization@...ts.linux.dev, netdev@...r.kernel.org, 
	linux-kernel@...r.kernel.org, Xuewei Niu <niuxuewei.nxw@...group.com>
Subject: Re: [RFC PATCH 0/5] vsock/virtio: Add support for multi-devices

Hi,
thanks for this RFC!

On Fri, May 17, 2024 at 10:46:02PM GMT, Xuewei Niu wrote:
># Motivition
>
>Vsock is a lightweight and widely used data exchange mechanism between host
>and guest. Kata Containers, a secure container runtime, leverages the
>capability to exchange control data between the shim and the kata-agent.
>
>The Linux kernel only supports one vsock device for virtio-vsock transport,
>resulting in the following limitations:
>
>* Poor performance isolation: All vsock connections share the same
>virtqueue.

This might be fixed if we implement multi-queue in virtio-vsock.

>* Cannot enable more than one backend: Virtio-vsock, vhost-vsock, and
>vhost-user-vsock cannot be enabled simultaneously on the transport.
>
>We’d like to transfer networking data, such as TSI (Transparent Socket
>Impersonation), over vsock via the vhost-user protocol to reduce overhead.
>However, by default, the vsock device is occupied by the kata-agent.
>
># Usages
>
>Principle: **Supporting virtio-vsock multi-devices while also being
>compatible with existing ones.**
>
>## Connection from Guest to Host
>
>There are two valuable questions to take about:
>
>1. How to be compatible with the existing usages?
>2. How do we specify a virtio-vsock device?
>
>### Question 1
>
>Before we delve into question 1, I'd like to provide a piece of pseudocode
>as an example of one of the existing use cases from the guest's
>perspective.
>
>Assuming there is one virtio-vsock device with CID 4. One of existing
>usages to connect to host is shown as following.
>
>```
>fd = socket(AF_VSOCK);
>connect(fd, 2, 1234);
>n = write(fd, buffer);
>```
>
>The result is that a connection is established from the guest (4, ?) to the
>host (2, 1234), where "?" denotes a random port.
>
>In the context of multi-devices, there are more than two devices. If the
>users don’t specify one CID explicitly, the kernel becomes confused about
>which device to use. The new implementation should be compatible with the
>old one.
>
>We expanded the virtio-vsock specification to address this issue. The
>specification now includes a new field called "order".
>
>```
>struct virtio_vsock_config {
>  __le64 guest_cid;
>  __le64 order;
>} _attribute_((packed));
>```
>
>In the phase of virtio-vsock driver probing, the guest kernel reads 
>from
>VMM to get the order of each device. **We stipulate that the device with the
>smallest order is regarded as the default device**(this mechanism functions
>as a 'default gateway' in networking).
>
>Assuming there are three virtio-vsock devices: device1 (CID=3), device2
>(CID=4), and device3 (CID=5). The arrangement of the list is as follows
>from the perspective of the guest kernel:
>
>```
>virtio_vsock_list =
>virtio_vsock { cid: 4, order: 0 } -> virtio_vsock { cid: 3, order: 1 } -> virtio_vsock { cid: 5, order: 10 }
>```
>
>At this time, the guest kernel realizes that the device2 (CID=4) is the
>default device. Execute the same code as before.
>
>```
>fd = socket(AF_VSOCK);
>connect(fd, 2, 1234);
>n = write(fd, buffer);
>```
>
>A connection will be established from the guest (4, ?) to the host (2, 1234).

It seems that only the one with order 0 is used here though, so what is 
the ordering for?
Wouldn't it suffice to simply indicate the default device (e.g., like 
the default gateway for networking)?

>
>### Question 2
>
>Now, the user wants to specify a device instead of the default one. An
>explicit binding operation is required to be performed.
>
>Use the device (CID=3), where “-1” represents any port, the kernel will

We have a macro: VMADDR_PORT_ANY (which is -1)

>search an available port automatically.
>
>```
>fd = socket(AF_VSOCK);
>bind(fd, 3, -1);
>connect(fd, 2, 1234);)
>n = write(fd, buffer);
>```
>
>Use the device (CID=4).
>
>```
>fd = socket(AF_VSOCK);
>bind(fd, 4, -1);
>connect(fd, 2, 1234);
>n = write(fd, buffer);
>```
>
>## Connection from Host to Guest
>
>Connection from host to guest is quite similar to the existing usages. The
>device’s CID is specified by the bind operation.
>
>Listen at the device (CID=3)’s port 10000.
>
>```
>fd = socket(AF_VSOCK);
>bind(fd, 3, 10000);
>listen(fd);
>new_fd = accept(fd, &host_cid, &host_port);
>n = write(fd, buffer);
>```
>
>Listen at the device (CID=4)’s port 10000.
>
>```
>fd = socket(AF_VSOCK);
>bind(fd, 4, 10000);
>listen(fd);
>new_fd = accept(fd, &host_cid, &host_port);
>n = write(fd, buffer);
>```
>
># Use Cases
>
>We've completed a POC with Kata Containers, Ztunnel, which is a
>purpose-built per-node proxy for Istio ambient mesh, and TSI. Please refer
>to the following link for more details.
>
>Link: https://bit.ly/4bdPJbU

Thank you for this RFC, I left several comments in the patches, we still 
have some work to do, but I think it is something we can support :-)

Here I summarize the things that I think we need to fix:
1. Avoid adding transport-specific things in af_vsock.c
    We need to have a generic API to allow other transports to implement
    the same functionality.
2. We need to add negotiation of a new feature in virtio/vhost transports
    We need to enable or disable support depending on whether the
    feature is negotiated, since guest and host may not support it.
3. Re-work the patch order for bisectability (more detail on patches 3/4)
4. Do we really need the order or just a default device?
5. Check if we can add some tests in tools/testing/vsock
6. When we agree on the RFC, we should discuss the spec changes in the
    virtio ML before sending a non-RFC series on Linux

These are the main things, but I left other comments in the patches.

Thanks,
Stefano


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ