netdev - Re: [PATCH net-next] tcp: use RCU in __inet{6}_check

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANn89iLpVW5bs7y8Hr5b07_7CAV2XkOgC9E7goCWpjCaiEKj6A@mail.gmail.com>
Date: Sun, 2 Mar 2025 08:11:33 +0100
From: Eric Dumazet <edumazet@...gle.com>
To: Jason Xing <kerneljasonxing@...il.com>
Cc: "David S . Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>, 
	Paolo Abeni <pabeni@...hat.com>, Neal Cardwell <ncardwell@...gle.com>, 
	Kuniyuki Iwashima <kuniyu@...zon.com>, Simon Horman <horms@...nel.org>, netdev@...r.kernel.org, 
	eric.dumazet@...il.com
Subject: Re: [PATCH net-next] tcp: use RCU in __inet{6}_check_established()

On Sun, Mar 2, 2025 at 1:17 AM Jason Xing <kerneljasonxing@...il.com> wrote:
>
> On Sun, Mar 2, 2025 at 3:46 AM Eric Dumazet <edumazet@...gle.com> wrote:
> >
> > When __inet_hash_connect() has to try many 4-tuples before
> > finding an available one, we see a high spinlock cost from
> > __inet_check_established() and/or __inet6_check_established().
> >
> > This patch adds an RCU lookup to avoid the spinlock
> > acquisition if the 4-tuple is found in the hash table.
> >
> > Note that there are still spin_lock_bh() calls in
> > __inet_hash_connect() to protect inet_bind_hashbucket,
> > this will be fixed in a future patch.
> >
> > Signed-off-by: Eric Dumazet <edumazet@...gle.com>
>
> Reviewed-by: Jason Xing <kerneljasonxing@...il.com>
>
> It can introduce extra system overhead in most cases because it takes
> effect only when the socket is not unique in the hash table. I'm not
> sure what the probability of seeing this case is in reality in
> general. Considering performing a look-up seems not to consume much, I
> think it looks good to me. Well, it's the only one I'm a bit worried
> about.
>
> As you said, it truly mitigates the huge contention in the earlier
> mentioned case where the available port resources are becoming rare.
> We've encountered this situation causing high cpu load before. Thanks
> for the optimization!

Addition of bhash2 in 6.1 added a major regression.

This is the reason I started to work on this stuff.
I will send the whole series later today, but I get a ~200% increase
in performance.
I will provide numbers in the cover letter.

neper/tcp_crr can be used to measure the gains.

Both server/client have 240 cores, 480 hyperthreads (Intel(R) Xeon(R) 6985P-C)

Server
ulimit -n 40000; neper/tcp_crr -6 -T200 -F20000 --nolog

Client
ulimit -n 40000; neper/tcp_crr -6 -T200 -F20000 --nolog -c -H server

Before this first patch:

utime_start=0.210641
utime_end=1.704755
stime_start=11.842697
stime_end=1997.341498
nvcsw_start=18518
nvcsw_end=18672
nivcsw_start=26
nivcsw_end=14828
num_transactions=615906
latency_min=0.051826868
latency_max=12.015396087
latency_mean=0.642949344
latency_stddev=1.860316922
num_samples=207534
correlation_coefficient=1.00
throughput=62524.04

After this patch:

utime_start=0.185656
utime_end=2.436602
stime_start=11.470889
stime_end=1980.679087
nvcsw_start=17327
nvcsw_end=17514
nivcsw_start=48
nivcsw_end=77724
num_transactions=821025
latency_min=0.025097789
latency_max=11.581610596
latency_mean=0.475903462
latency_stddev=1.597439931
num_samples=206556
time_end=173.321207377
correlation_coefficient=1.00
throughput=84387.19