[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d6ad85649fd56d3e12e59085836326a09885593b.camel@infradead.org>
Date: Wed, 23 Jun 2021 17:12:16 +0100
From: David Woodhouse <dwmw2@...radead.org>
To: Jason Wang <jasowang@...hat.com>, netdev@...r.kernel.org
Cc: Eugenio Pérez <eperezma@...hat.com>
Subject: Re: [PATCH v2 4/4] vhost_net: Add self test with tun device
On Wed, 2021-06-23 at 12:02 +0800, Jason Wang wrote:
> 在 2021/6/23 上午12:15, David Woodhouse 写道:
> > From: David Woodhouse <dwmw@...zon.co.uk>
> >
> > This creates a tun device and brings it up, then finds out the link-local
> > address the kernel automatically assigns to it.
> >
> > It sends a ping to that address, from a fake LL address of its own, and
> > then waits for a response.
> >
> > If the virtio_net_hdr stuff is all working correctly, it gets a response
> > and manages to understand it.
>
>
> I wonder whether it worth to bother the dependency like ipv6 or kernel
> networking stack.
>
> How about simply use packet socket that is bound to tun to receive and
> send packets?
>
I pondered that but figured that using the kernel's network stack
wasn't too much of an additional dependency. We *could* use an
AF_PACKET socket on the tun device and then drive both ends, but given
that the kernel *automatically* assigns a link-local address when we
bring the device up anyway, it seemed simple enough just to use ICMP.
I also happened to have the ICMP generation/checking code lying around
anyway in the same emacs instance, so it was reduced to a previously
solved problem.
We *should* eventually expand this test case to attach an AF_PACKET
device to the vhost-net, instead of using a tun device as the back end.
(Although I don't really see *why* vhost is limited to AF_PACKET. Why
*can't* I attach anything else, like an AF_UNIX socket, to vhost-net?)
> > + /*
> > + * I just want to map the *whole* of userspace address space. But
> > + * from userspace I don't know what that is. On x86_64 it would be:
> > + *
> > + * vmem->regions[0].guest_phys_addr = 4096;
> > + * vmem->regions[0].memory_size = 0x7fffffffe000;
> > + * vmem->regions[0].userspace_addr = 4096;
> > + *
> > + * For now, just ensure we put everything inside a single BSS region.
> > + */
> > + vmem->regions[0].guest_phys_addr = (uint64_t)&rings;
> > + vmem->regions[0].userspace_addr = (uint64_t)&rings;
> > + vmem->regions[0].memory_size = sizeof(rings);
>
>
> Instead of doing tricks like this, we can do it in another way:
>
> 1) enable device IOTLB
> 2) wait for the IOTLB miss request (iova, len) and update identity
> mapping accordingly
>
> This should work for all the archs (with some performance hit).
Ick. For my actual application (OpenConnect) I'm either going to suck
it up and put in the arch-specific limits like in the comment above, or
I'll fix things to do the VHOST_F_IDENTITY_MAPPING thing we're talking
about elsewhere. (Probably the former, since if I'm requiring kernel
changes then I have grander plans around extending AF_TLS to do DTLS,
then hooking that directly up to the tun socket via BPF and a sockmap
without the data frames ever going to userspace at all.)
For this test case, a hard-coded single address range in BSS is fine.
I've now added !IFF_NO_PI support to the test case, but as noted it
fails just like the other ones I'd already marked with #if 0, which is
because vhost-net pulls some value for 'sock_hlen' out of its posterior
based on some assumption around the vhost features. And then expects
sock_recvmsg() to return precisely that number of bytes more than the
value it peeks in the skb at the head of the sock's queue.
I think I can fix *all* those test cases by making tun_get_socket()
take an extra 'int *' argument, and use that to return the *actual*
value of sock_hlen. Here's the updated test case in the meantime:
From cf74e3fc80b8fd9df697a42cfc1ff3887de18f78 Mon Sep 17 00:00:00 2001
From: David Woodhouse <dwmw@...zon.co.uk>
Date: Wed, 23 Jun 2021 16:38:56 +0100
Subject: [PATCH] test_vhost_net: add test cases with tun_pi header
These fail too, for the same reason as the previous tests were
guarded with #if 0: vhost-net pulls 'sock_hlen' out of its
posterior and just assumes it's 10 bytes. And then barfs when
a sock_recvmsg() doesn't return precisely ten bytes more than
it peeked in the head skb:
[1296757.531103] Discarded rx packet: len 78, expected 74
Signed-off-by: David Woodhouse <dwmw@...zon.co.uk>
---
.../testing/selftests/vhost/test_vhost_net.c | 97 +++++++++++++------
1 file changed, 65 insertions(+), 32 deletions(-)
diff --git a/tools/testing/selftests/vhost/test_vhost_net.c b/tools/testing/selftests/vhost/test_vhost_net.c
index fd4a2b0e42f0..734b3015a5bd 100644
--- a/tools/testing/selftests/vhost/test_vhost_net.c
+++ b/tools/testing/selftests/vhost/test_vhost_net.c
@@ -48,7 +48,7 @@ static unsigned char hexchar(char *hex)
return (hexnybble(hex[0]) << 4) | hexnybble(hex[1]);
}
-int open_tun(int vnet_hdr_sz, struct in6_addr *addr)
+int open_tun(int vnet_hdr_sz, int pi, struct in6_addr *addr)
{
int tun_fd = open("/dev/net/tun", O_RDWR);
if (tun_fd == -1)
@@ -56,7 +56,9 @@ int open_tun(int vnet_hdr_sz, struct in6_addr *addr)
struct ifreq ifr = { 0 };
- ifr.ifr_flags = IFF_TUN | IFF_NO_PI;
+ ifr.ifr_flags = IFF_TUN;
+ if (!pi)
+ ifr.ifr_flags |= IFF_NO_PI;
if (vnet_hdr_sz)
ifr.ifr_flags |= IFF_VNET_HDR;
@@ -249,11 +251,18 @@ static inline uint16_t csum_finish(uint32_t sum)
return htons((uint16_t)(~sum));
}
-static int create_icmp_echo(unsigned char *data, struct in6_addr *dst,
+static int create_icmp_echo(unsigned char *data, int pi, struct in6_addr *dst,
struct in6_addr *src, uint16_t id, uint16_t seq)
{
const int icmplen = ICMP_MINLEN + sizeof(ping_payload);
- const int plen = sizeof(struct ip6_hdr) + icmplen;
+ int plen = sizeof(struct ip6_hdr) + icmplen;
+
+ if (pi) {
+ struct tun_pi *pi = (void *)data;
+ data += sizeof(*pi);
+ plen += sizeof(*pi);
+ pi->proto = htons(ETH_P_IPV6);
+ }
struct ip6_hdr *iph = (void *)data;
struct icmp6_hdr *icmph = (void *)(data + sizeof(*iph));
@@ -312,8 +321,21 @@ static int create_icmp_echo(unsigned char *data, struct in6_addr *dst,
}
-static int check_icmp_response(unsigned char *data, uint32_t len, struct in6_addr *dst, struct in6_addr *src)
+static int check_icmp_response(unsigned char *data, uint32_t len, int pi,
+ struct in6_addr *dst, struct in6_addr *src)
{
+ if (pi) {
+ struct tun_pi *pi = (void *)data;
+ if (len < sizeof(*pi))
+ return 0;
+
+ if (pi->proto != htons(ETH_P_IPV6))
+ return 0;
+
+ data += sizeof(*pi);
+ len -= sizeof(*pi);
+ }
+
struct ip6_hdr *iph = (void *)data;
return ( len >= 41 && (ntohl(iph->ip6_flow) >> 28)==6 /* IPv6 header */
&& iph->ip6_nxt == IPPROTO_ICMPV6 /* IPv6 next header field = ICMPv6 */
@@ -337,7 +359,7 @@ static int check_icmp_response(unsigned char *data, uint32_t len, struct in6_add
#endif
-int test_vhost(int vnet_hdr_sz, int xdp, uint64_t features)
+int test_vhost(int vnet_hdr_sz, int pi, int xdp, uint64_t features)
{
int call_fd = eventfd(0, EFD_CLOEXEC|EFD_NONBLOCK);
int kick_fd = eventfd(0, EFD_CLOEXEC|EFD_NONBLOCK);
@@ -353,7 +375,7 @@ int test_vhost(int vnet_hdr_sz, int xdp, uint64_t features)
/* Pick up the link-local address that the kernel
* assigns to the tun device. */
struct in6_addr tun_addr;
- tun_fd = open_tun(vnet_hdr_sz, &tun_addr);
+ tun_fd = open_tun(vnet_hdr_sz, pi, &tun_addr);
if (tun_fd < 0)
goto err;
@@ -387,18 +409,18 @@ int test_vhost(int vnet_hdr_sz, int xdp, uint64_t features)
local_addr.s6_addr16[0] = htons(0xfe80);
local_addr.s6_addr16[7] = htons(1);
+
/* Set up RX and TX descriptors; the latter with ping packets ready to
* send to the kernel, but don't actually send them yet. */
for (int i = 0; i < RING_SIZE; i++) {
struct pkt_buf *pkt = &rings[1].pkts[i];
- int plen = create_icmp_echo(&pkt->data[vnet_hdr_sz], &tun_addr,
- &local_addr, 0x4747, i);
+ int plen = create_icmp_echo(&pkt->data[vnet_hdr_sz], pi,
+ &tun_addr, &local_addr, 0x4747, i);
rings[1].desc[i].addr = vio64((uint64_t)pkt);
rings[1].desc[i].len = vio32(plen + vnet_hdr_sz);
rings[1].avail_ring[i] = vio16(i);
-
pkt = &rings[0].pkts[i];
rings[0].desc[i].addr = vio64((uint64_t)pkt);
rings[0].desc[i].len = vio32(sizeof(*pkt));
@@ -438,9 +460,10 @@ int test_vhost(int vnet_hdr_sz, int xdp, uint64_t features)
return -1;
if (check_icmp_response((void *)(addr + vnet_hdr_sz), len - vnet_hdr_sz,
- &local_addr, &tun_addr)) {
+ pi, &local_addr, &tun_addr)) {
ret = 0;
- printf("Success (%d %d %llx)\n", vnet_hdr_sz, xdp, (unsigned long long)features);
+ printf("Success (hdr %d, xdp %d, pi %d, features %llx)\n",
+ vnet_hdr_sz, xdp, pi, (unsigned long long)features);
goto err;
}
@@ -466,51 +489,61 @@ int test_vhost(int vnet_hdr_sz, int xdp, uint64_t features)
return ret;
}
-
-int main(void)
+/* Perform the given test with all four combinations of XDP/PI */
+int test_four(int vnet_hdr_sz, uint64_t features)
{
- int ret;
-
- ret = test_vhost(0, 0, ((1ULL << VHOST_NET_F_VIRTIO_NET_HDR) |
- (1ULL << VIRTIO_F_VERSION_1)));
+ int ret = test_vhost(vnet_hdr_sz, 0, 0, features);
if (ret && ret != KSFT_SKIP)
return ret;
- ret = test_vhost(0, 1, ((1ULL << VHOST_NET_F_VIRTIO_NET_HDR) |
- (1ULL << VIRTIO_F_VERSION_1)));
+ ret = test_vhost(vnet_hdr_sz, 0, 1, features);
if (ret && ret != KSFT_SKIP)
return ret;
-
- ret = test_vhost(0, 0, ((1ULL << VHOST_NET_F_VIRTIO_NET_HDR)));
+#if 0 /* These don't work *either* for the same reason as the #if 0 later */
+ ret = test_vhost(vnet_hdr_sz, 1, 0, features);
if (ret && ret != KSFT_SKIP)
return ret;
- ret = test_vhost(0, 1, ((1ULL << VHOST_NET_F_VIRTIO_NET_HDR)));
+ ret = test_vhost(vnet_hdr_sz, 1, 1, features);
if (ret && ret != KSFT_SKIP)
return ret;
+#endif
+}
- ret = test_vhost(10, 0, 0);
- if (ret && ret != KSFT_SKIP)
- return ret;
+int main(void)
+{
+ int ret;
- ret = test_vhost(10, 1, 0);
+ ret = test_four(10, 0);
if (ret && ret != KSFT_SKIP)
return ret;
-#if 0 /* These ones will fail */
- ret = test_vhost(0, 0, 0);
+ ret = test_four(0, ((1ULL << VHOST_NET_F_VIRTIO_NET_HDR) |
+ (1ULL << VIRTIO_F_VERSION_1)));
if (ret && ret != KSFT_SKIP)
return ret;
- ret = test_vhost(0, 1, 0);
+ ret = test_four(0, ((1ULL << VHOST_NET_F_VIRTIO_NET_HDR)));
if (ret && ret != KSFT_SKIP)
return ret;
- ret = test_vhost(12, 0, 0);
+
+#if 0
+ /*
+ * These ones will fail, because right now vhost *assumes* that the
+ * underlying (tun, etc.) socket will be doing a header of precisely
+ * sizeof(struct virtio_net_hdr), if vhost isn't doing so itself due
+ * to VHOST_NET_F_VIRTIO_NET_HDR.
+ *
+ * That assumption breaks both tun with no IFF_VNET_HDR, and also
+ * presumably raw sockets. So leave these test cases disabled for
+ * now until it's fixed.
+ */
+ ret = test_four(0, 0);
if (ret && ret != KSFT_SKIP)
return ret;
- ret = test_vhost(12, 1, 0);
+ ret = test_four(12, 0);
if (ret && ret != KSFT_SKIP)
return ret;
#endif
--
2.31.1
Download attachment "smime.p7s" of type "application/x-pkcs7-signature" (5174 bytes)
Powered by blists - more mailing lists