[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zs_YY8RO_SQZv7nF@mini-arch>
Date: Wed, 28 Aug 2024 19:09:39 -0700
From: Stanislav Fomichev <sdf@...ichev.me>
To: Naman Gulati <namangulati@...gle.com>
Cc: Alexander Viro <viro@...iv.linux.org.uk>,
Christian Brauner <brauner@...nel.org>, Jan Kara <jack@...e.cz>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org, skhawaja@...gle.com,
Joe Damato <jdamato@...tly.com>,
Willem de Bruijn <willemdebruijn.kernel@...il.com>
Subject: Re: [PATCH] Add provision to busyloop for events in ep_poll.
On 08/28, Naman Gulati wrote:
> NAPI busypolling in ep_busy_loop loops on napi_poll and checks for new
> epoll events after every napi poll. Checking just for epoll events in a
> tight loop in the kernel context delivers latency gains to applications
> that are not interested in napi busypolling with epoll.
>
> This patch adds an option to loop just for new events inside
> ep_busy_loop, guarded by the EPIOCSPARAMS ioctl that controls epoll napi
> busypolling.
>
> A comparison with neper tcp_rr shows that busylooping for events in
> epoll_wait boosted throughput by ~3-7% and reduced median latency by
> ~10%.
>
> To demonstrate the latency and throughput improvements, a comparison was
> made of neper tcp_rr running with:
> 1. (baseline) No busylooping
> 2. (epoll busylooping) enabling the epoll busy looping on all epoll
> fd's
> 3. (userspace busylooping) looping on epoll_wait in userspace
> with timeout=0
>
> Stats for two machines with 100Gbps NICs running tcp_rr with 5 threads
> and varying flows:
>
> Type Flows Throughput Latency (μs)
> (B/s) P50 P90 P99 P99.9 P99.99
> baseline 15 272145 57.2 71.9 91.4 100.6 111.6
> baseline 30 464952 66.8 78.8 98.1 113.4 122.4
> baseline 60 695920 80.9 118.5 143.4 161.8 174.6
> epoll busyloop 15 301751 44.7 70.6 84.3 95.4 106.5
> epoll busyloop 30 508392 58.9 76.9 96.2 109.3 118.5
> epoll busyloop 60 745731 77.4 106.2 127.5 143.1 155.9
> userspace busyloop 15 279202 55.4 73.1 85.2 98.3 109.6
> userspace busyloop 30 472440 63.7 78.2 96.5 112.2 120.1
> userspace busyloop 60 720779 77.9 113.5 134.9 152.6 165.7
>
> Per the above data epoll busyloop outperforms baseline and userspace
> busylooping in both throughput and latency. As the density of flows per
> thread increased, the median latency of all three epoll mechanisms
> converges. However epoll busylooping is better at capturing the tail
> latencies at high flow counts.
Any idea why timeout=0 is not performing as well as looping inside the
kernel? Can we cut this overhead out? Or is it pure syscall overhead? (usecs?)
Powered by blists - more mailing lists