lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20240517144607.2595798-1-niuxuewei.nxw@antgroup.com>
Date: Fri, 17 May 2024 22:46:02 +0800
From: Xuewei Niu <niuxuewei97@...il.com>
To: stefanha@...hat.com,
	sgarzare@...hat.com
Cc: mst@...hat.com,
	davem@...emloft.net,
	kvm@...r.kernel.org,
	virtualization@...ts.linux.dev,
	netdev@...r.kernel.org,
	linux-kernel@...r.kernel.org,
	Xuewei Niu <niuxuewei.nxw@...group.com>
Subject: [RFC PATCH 0/5] vsock/virtio: Add support for multi-devices

# Motivition

Vsock is a lightweight and widely used data exchange mechanism between host
and guest. Kata Containers, a secure container runtime, leverages the
capability to exchange control data between the shim and the kata-agent.

The Linux kernel only supports one vsock device for virtio-vsock transport,
resulting in the following limitations:

* Poor performance isolation: All vsock connections share the same
virtqueue.
* Cannot enable more than one backend: Virtio-vsock, vhost-vsock, and
vhost-user-vsock cannot be enabled simultaneously on the transport.

We’d like to transfer networking data, such as TSI (Transparent Socket
Impersonation), over vsock via the vhost-user protocol to reduce overhead.
However, by default, the vsock device is occupied by the kata-agent.

# Usages

Principle: **Supporting virtio-vsock multi-devices while also being
compatible with existing ones.**

## Connection from Guest to Host

There are two valuable questions to take about:

1. How to be compatible with the existing usages?
2. How do we specify a virtio-vsock device?

### Question 1

Before we delve into question 1, I'd like to provide a piece of pseudocode
as an example of one of the existing use cases from the guest's
perspective.

Assuming there is one virtio-vsock device with CID 4. One of existing
usages to connect to host is shown as following.

```
fd = socket(AF_VSOCK);
connect(fd, 2, 1234);
n = write(fd, buffer);
```

The result is that a connection is established from the guest (4, ?) to the
host (2, 1234), where "?" denotes a random port.

In the context of multi-devices, there are more than two devices. If the
users don’t specify one CID explicitly, the kernel becomes confused about
which device to use. The new implementation should be compatible with the
old one.

We expanded the virtio-vsock specification to address this issue. The
specification now includes a new field called "order".

```
struct virtio_vsock_config {
  __le64 guest_cid;
  __le64 order;
} _attribute_((packed));
```

In the phase of virtio-vsock driver probing, the guest kernel reads from
VMM to get the order of each device. **We stipulate that the device with the
smallest order is regarded as the default device**(this mechanism functions
as a 'default gateway' in networking).

Assuming there are three virtio-vsock devices: device1 (CID=3), device2
(CID=4), and device3 (CID=5). The arrangement of the list is as follows
from the perspective of the guest kernel:

```
virtio_vsock_list =
virtio_vsock { cid: 4, order: 0 } -> virtio_vsock { cid: 3, order: 1 } -> virtio_vsock { cid: 5, order: 10 }
```

At this time, the guest kernel realizes that the device2 (CID=4) is the
default device. Execute the same code as before.

```
fd = socket(AF_VSOCK);
connect(fd, 2, 1234);
n = write(fd, buffer);
```

A connection will be established from the guest (4, ?) to the host (2, 1234).

### Question 2

Now, the user wants to specify a device instead of the default one. An
explicit binding operation is required to be performed.

Use the device (CID=3), where “-1” represents any port, the kernel will
search an available port automatically.

```
fd = socket(AF_VSOCK);
bind(fd, 3, -1);
connect(fd, 2, 1234);
n = write(fd, buffer);
```

Use the device (CID=4).

```
fd = socket(AF_VSOCK);
bind(fd, 4, -1);
connect(fd, 2, 1234);
n = write(fd, buffer);
```

## Connection from Host to Guest

Connection from host to guest is quite similar to the existing usages. The
device’s CID is specified by the bind operation.

Listen at the device (CID=3)’s port 10000.

```
fd = socket(AF_VSOCK);
bind(fd, 3, 10000);
listen(fd);
new_fd = accept(fd, &host_cid, &host_port);
n = write(fd, buffer);
```

Listen at the device (CID=4)’s port 10000.

```
fd = socket(AF_VSOCK);
bind(fd, 4, 10000);
listen(fd);
new_fd = accept(fd, &host_cid, &host_port);
n = write(fd, buffer);
```

# Use Cases

We've completed a POC with Kata Containers, Ztunnel, which is a
purpose-built per-node proxy for Istio ambient mesh, and TSI. Please refer
to the following link for more details.

Link: https://bit.ly/4bdPJbU

Xuewei Niu (5):
  vsock/virtio: Extend virtio-vsock spec with an "order" field
  vsock/virtio: Add support for multi-devices
  vsock/virtio: can_msgzerocopy adapts to multi-devices
  vsock: seqpacket_allow adapts to multi-devices
  vsock: Add an ioctl request to get all CIDs

 include/linux/virtio_vsock.h            |   2 +-
 include/net/af_vsock.h                  |  25 ++-
 include/uapi/linux/virtio_vsock.h       |   1 +
 include/uapi/linux/vm_sockets.h         |  14 ++
 net/vmw_vsock/af_vsock.c                | 116 +++++++++--
 net/vmw_vsock/virtio_transport.c        | 255 ++++++++++++++++++------
 net/vmw_vsock/virtio_transport_common.c |  16 +-
 net/vmw_vsock/vsock_loopback.c          |   4 +-
 8 files changed, 352 insertions(+), 81 deletions(-)

-- 
2.34.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ