[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <994e315b-fdb7-1467-553e-290d4434d853@gmail.com>
Date: Wed, 1 Dec 2021 15:32:36 +0000
From: Pavel Begunkov <asml.silence@...il.com>
To: David Ahern <dsahern@...il.com>, io-uring@...r.kernel.org,
netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Cc: Jakub Kicinski <kuba@...nel.org>,
Jonathan Lemon <jonathan.lemon@...il.com>,
"David S . Miller" <davem@...emloft.net>,
Willem de Bruijn <willemb@...gle.com>,
Eric Dumazet <edumazet@...gle.com>,
Hideaki YOSHIFUJI <yoshfuji@...ux-ipv6.org>,
David Ahern <dsahern@...nel.org>, Jens Axboe <axboe@...nel.dk>
Subject: Re: [RFC 00/12] io_uring zerocopy send
On 12/1/21 03:10, David Ahern wrote:
> On 11/30/21 8:18 AM, Pavel Begunkov wrote:
>> Early proof of concept for zerocopy send via io_uring. This is just
>> an RFC, there are details yet to be figured out, but hope to gather
>> some feedback.
>>
>> Benchmarking udp (65435 bytes) with a dummy net device (mtu=0xffff):
>> The best case io_uring=116079 MB/s vs msg_zerocopy=47421 MB/s,
>> or 2.44 times faster.
>>
>> № | test: | BW (MB/s) | speedup
>> 1 | msg_zerocopy (non-zc) | 18281 | 0.38
>> 2 | msg_zerocopy -z (baseline) | 47421 | 1
>> 3 | io_uring (@flush=false, nr_reqs=1) | 96534 | 2.03
>> 4 | io_uring (@flush=true, nr_reqs=1) | 89310 | 1.88
>> 5 | io_uring (@flush=false, nr_reqs=8) | 116079 | 2.44
>> 6 | io_uring (@flush=true, nr_reqs=8) | 109722 | 2.31
>>
>> Based on selftests/.../msg_zerocopy but more limited. You can use
>> msg_zerocopy -r as usual for receive side.
>>
> ...
>
> Can you state the exact command lines you are running for all of the
> commands? I tried this set (and commands referenced below) and my
Sure. First, for dummy I set mtu by hand, not sure can do it from
the userspace, can I? Without it __ip_append_data() falls into
non-zerocopy path.
diff --git a/drivers/net/dummy.c b/drivers/net/dummy.c
index f82ad7419508..5c5aeacdabd5 100644
--- a/drivers/net/dummy.c
+++ b/drivers/net/dummy.c
@@ -132,7 +132,8 @@ static void dummy_setup(struct net_device *dev)
eth_hw_addr_random(dev);
dev->min_mtu = 0;
- dev->max_mtu = 0;
+ dev->mtu = 0xffff;
+ dev->max_mtu = 0xffff;
}
# dummy configuration
modprobe dummy numdummies=1
ip link set dummy0 up
# force requests to <dummy_ip_addr> go through the dummy device
ip route add <dummy_ip_addr> dev dummy0
With dummy I was just sinking the traffic to the dummy device,
was good enough for me. Omitting "taskset" and "nice":
send-zc -4 -D <dummy_ip_addr> -t 10 udp
Similarly with msg_zerocopy:
<kernel>/tools/testing/selftests/net/msg_zerocopy -4 -p 6666 -D <dummy_ip_addr> -t 10 -z udp
For loopback testing, as zerocopy is not allowed for it as Willem explained in
the original MSG_ZEROCOPY cover-letter, I used a hack to bypass it:
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index ebb12a7d386d..42df33b175ce 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2854,9 +2854,7 @@ static inline int skb_orphan_frags(struct sk_buff *skb, gfp_t gfp_mask)
/* Frags must be orphaned, even if refcounted, if skb might loop to rx path */
static inline int skb_orphan_frags_rx(struct sk_buff *skb, gfp_t gfp_mask)
{
- if (likely(!skb_zcopy(skb)))
- return 0;
- return skb_copy_ubufs(skb, gfp_mask);
+ return skb_orphan_frags(skb, gfp_mask);
}
/**
Then running those two lines below in parallel and looking for the numbers
send shows. It was in favor of io_uring for me, but don't remember
exactly. perf shows that "send-zc" spends lot of time receiving, so
wasn't testing performance of it after some point.
msg_zerocopy -r -v -4 -t 20 udp
send-zc -4 -D 127.0.0.1 -t 10 udp
> mileage varies quite a bit.
Interesting, any brief notes on the setup and the results? Dummy
or something real? io_uring doesn't show if it was really zerocopied
or not, but I assume you checked it (e.g. with perf/bpftrace).
I expected that @flush=true might be worse with real devices,
there is one spot to be patched, but apart from that and
cycles spend in a real LLD offseting the overhead, didn't
anticipate any problems. I'll see once I try a real device.
> Also, have you run this proposed change (and with TCP) across nodes
> (ie., not just local process to local process via dummy interface)?
Not yet, I tried dummy, and localhost UDP as per above and similarly
TCP. Just need to grab a server with a proper NIC, will try it out
soon.
>> Benchmark:
>> https://github.com/isilence/liburing.git zc_v1
>>
>> or this file in particular:
>> https://github.com/isilence/liburing/blob/zc_v1/test/send-zc.c
>>
>> To run the benchmark:
>> ```
>> cd <liburing_dir> && make && cd test
>> # ./send-zc -4 [-p <port>] [-s <payload_size>] -D <destination> udp
>> ./send-zc -4 -D 127.0.0.1 udp
>> ```
>>
>> msg_zerocopy can be used for the server side, e.g.
>> ```
>> cd <linux-kernel>/tools/testing/selftests/net && make
>> ./msg_zerocopy -4 -r [-p <port>] [-t <sec>] udp
>> ```
--
Pavel Begunkov
Powered by blists - more mailing lists