[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250306042445.69938-1-kuniyu@amazon.com>
Date: Wed, 5 Mar 2025 20:24:39 -0800
From: Kuniyuki Iwashima <kuniyu@...zon.com>
To: <edumazet@...gle.com>
CC: <davem@...emloft.net>, <eric.dumazet@...il.com>, <horms@...nel.org>,
<kernelxing@...cent.com>, <kuba@...nel.org>, <kuniyu@...zon.com>,
<ncardwell@...gle.com>, <netdev@...r.kernel.org>, <pabeni@...hat.com>
Subject: [PATCH net-next 1/2] inet: change lport contribution to inet_ehashfn() and inet6_ehashfn()
From: Eric Dumazet <edumazet@...gle.com>
Date: Wed, 5 Mar 2025 03:45:49 +0000
> In order to speedup __inet_hash_connect(), we want to ensure hash values
> for <source address, port X, destination address, destination port>
> are not randomly spread, but monotonically increasing.
>
> Goal is to allow __inet_hash_connect() to derive the hash value
> of a candidate 4-tuple with a single addition in the following
> patch in the series.
>
> Given :
> hash_0 = inet_ehashfn(saddr, 0, daddr, dport)
> hash_sport = inet_ehashfn(saddr, sport, daddr, dport)
>
> Then (hash_sport == hash_0 + sport) for all sport values.
>
> As far as I know, there is no security implication with this change.
>
> After this patch, when __inet_hash_connect() has to try XXXX candidates,
> the hash table buckets are contiguous and packed, allowing
> a better use of cpu caches and hardware prefetchers.
>
> Tested:
>
> Server: ulimit -n 40000; neper/tcp_crr -T 200 -F 30000 -6 --nolog
> Client: ulimit -n 40000; neper/tcp_crr -T 200 -F 30000 -6 --nolog -c -H server
>
> Before this patch:
>
> utime_start=0.271607
> utime_end=3.847111
> stime_start=18.407684
> stime_end=1997.485557
> num_transactions=1350742
> latency_min=0.014131929
> latency_max=17.895073144
> latency_mean=0.505675853
> latency_stddev=2.125164772
> num_samples=307884
> throughput=139866.80
>
> perf top on client:
>
> 56.86% [kernel] [k] __inet6_check_established
> 17.96% [kernel] [k] __inet_hash_connect
> 13.88% [kernel] [k] inet6_ehashfn
> 2.52% [kernel] [k] rcu_all_qs
> 2.01% [kernel] [k] __cond_resched
> 0.41% [kernel] [k] _raw_spin_lock
>
> After this patch:
>
> utime_start=0.286131
> utime_end=4.378886
> stime_start=11.952556
> stime_end=1991.655533
> num_transactions=1446830
> latency_min=0.001061085
> latency_max=12.075275028
> latency_mean=0.376375302
> latency_stddev=1.361969596
> num_samples=306383
> throughput=151866.56
>
> perf top:
>
> 50.01% [kernel] [k] __inet6_check_established
> 20.65% [kernel] [k] __inet_hash_connect
> 15.81% [kernel] [k] inet6_ehashfn
> 2.92% [kernel] [k] rcu_all_qs
> 2.34% [kernel] [k] __cond_resched
> 0.50% [kernel] [k] _raw_spin_lock
> 0.34% [kernel] [k] sched_balance_trigger
> 0.24% [kernel] [k] queued_spin_lock_slowpath
>
> There is indeed an increase of throughput and reduction of latency.
>
> Signed-off-by: Eric Dumazet <edumazet@...gle.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@...zon.com>
Powered by blists - more mailing lists