[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9ce73c04-ee8f-4aa5-ab15-c31dbe409401@kernel.org>
Date: Fri, 8 Mar 2024 21:53:47 -0700
From: David Ahern <dsahern@...nel.org>
To: Leone Fernando <leone4fernando@...il.com>, davem@...emloft.net,
edumazet@...gle.com, kuba@...nel.org, pabeni@...hat.com, willemb@...gle.com
Cc: netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH net-next 0/4] net: route: improve route hinting
On 3/7/24 10:11 AM, Leone Fernando wrote:
> In 2017, Paolo Abeni introduced the hinting mechanism [1] to the routing
> sub-system. The hinting optimization improves performance by reusing
> previously found dsts instead of looking them up for each skb.
>
> This patch series introduces a generalized version of the hinting mechanism that
> can "remember" a larger number of dsts. This reduces the number of dst
> lookups for frequently encountered daddrs.
>
> Before diving into the code and the benchmarking results, it's important
> to address the deletion of the old route cache [2] and why
> this solution is different. The original cache was complicated,
> vulnerable to DOS attacks and had unstable performance.
>
> The new input dst_cache is much simpler thanks to its lazy approach,
> improving performance without the overhead of the removed cache
> implementation. Instead of using timers and GC, the deletion of invalid
> entries is performed lazily during their lookups.
> The dsts are stored in a simple, lightweight, static hash table. This
> keeps the lookup times fast yet stable, preventing DOS upon cache misses.
> The new input dst_cache implementation is built over the existing
> dst_cache code which supplies a fast lockless percpu behavior.
>
> I tested this patch using udp floods with different number of daddrs.
> The benchmarking setup is comprised of 3 machines: a sender,
> a forwarder and a receiver. I measured the PPS received by the receiver
> as the forwarder was running either the mainline kernel or the patched
> kernel, comparing the results. The dst_cache I tested in this benchmark
> used a total of 512 hash table entries, split into buckets of 4
> entries each.
>
> These are the results:
> UDP mainline patched delta
> conns pcpu Kpps Kpps %
> 1 274.0255 269.2205 -1.75
> 2 257.3748 268.0947 4.17
> 15 241.3513 258.8016 7.23
> 100 238.3419 258.4939 8.46
> 500 238.5390 252.6425 5.91
> 1000 238.7570 242.1820 1.43
> 2000 238.7780 236.2640 -1.05
> 4000 239.0440 233.5320 -2.31
> 8000 239.3248 232.5680 -2.82
>
I have looked at all of the sets sent. I can not convince myself this is
a good idea, but at the same time I do not have constructive feedback on
why it is not acceptable. The gains are modest at best.
Powered by blists - more mailing lists