[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <890db004-4dfe-7f77-61ee-1ac0d7d2a24c@gmail.com>
Date: Wed, 18 Apr 2018 09:56:22 -0700
From: Eric Dumazet <eric.dumazet@...il.com>
To: Paolo Abeni <pabeni@...hat.com>, netdev@...r.kernel.org
Cc: "David S. Miller" <davem@...emloft.net>,
Eric Dumazet <eric.dumazet@...il.com>
Subject: Re: [PATCH net-next 2/2] udp: implement and use per cpu rx skbs cache
On 04/18/2018 03:22 AM, Paolo Abeni wrote:
> This changeset extends the idea behind commit c8c8b127091b ("udp:
> under rx pressure, try to condense skbs"), trading more BH cpu
> time and memory bandwidth to decrease the load on the user space
> receiver.
>
> At boot time we allocate a limited amount of skbs with small
> data buffer, storing them in per cpu arrays. Such skbs are never
> freed.
>
> At run time, under rx pressure, the BH tries to copy the current
> skb contents into the cache - if the current cache skb is available,
> and the ingress skb is small enough and without any head states.
>
> When using the cache skb, the ingress skb is dropped by the BH
> - while still hot on cache - and the cache skb is inserted into
> the rx queue, after increasing its usage count. Also, the cache
> array index is moved to the next entry.
>
> The receive side is unmodified: in udp_rcvmsg() the usage skb
> usage count is decreased and the skb is _not_ freed - since the
> cache keeps usage > 0. Since skb->usage is hot in the cache of the
> receiver at consume time - the receiver has just read skb->data,
> which lies in the same cacheline - the whole skb_consume_udp() becomes
> really cheap.
>
> UDP receive performances under flood improve as follow:
>
> NR RX queues Kpps Kpps Delta (%)
> Before After
>
> 1 2252 2305 2
> 2 2151 2569 19
> 4 2033 2396 17
> 8 1969 2329 18
>
> Overall performances of knotd DNS server under real traffic flood
> improves as follow:
>
> Kpps Kpps Delta (%)
> Before After
>
> 3777 3981 5
It might be time for knotd DNS server to finally use SO_REUSEPORT instead of
adding this bloat to the kernel ?
Sorry, 5% improvement while you easily can get 300% improvement with no kernel change
is not appealing to me :/
Powered by blists - more mailing lists