[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250304005114.64041-1-kuniyu@amazon.com>
Date: Mon, 3 Mar 2025 16:51:14 -0800
From: Kuniyuki Iwashima <kuniyu@...zon.com>
To: <edumazet@...gle.com>
CC: <davem@...emloft.net>, <eric.dumazet@...il.com>, <horms@...nel.org>,
<kerneljasonxing@...il.com>, <kuba@...nel.org>, <kuniyu@...zon.com>,
<ncardwell@...gle.com>, <netdev@...r.kernel.org>, <pabeni@...hat.com>
Subject: Re: [PATCH net-next 4/4] tcp: use RCU lookup in __inet_hash_connect()
From: Eric Dumazet <edumazet@...gle.com>
Date: Sun, 2 Mar 2025 12:42:37 +0000
> When __inet_hash_connect() has to try many 4-tuples before
> finding an available one, we see a high spinlock cost from
> the many spin_lock_bh(&head->lock) performed in its loop.
>
> This patch adds an RCU lookup to avoid the spinlock cost.
>
> check_established() gets a new @rcu_lookup argument.
> First reason is to not make any changes while head->lock
> is not held.
> Second reason is to not make this RCU lookup a second time
> after the spinlock has been acquired.
>
> Tested:
>
> Server:
>
> ulimit -n 40000; neper/tcp_crr -T 200 -F 30000 -6 --nolog
>
> Client:
>
> ulimit -n 40000; neper/tcp_crr -T 200 -F 30000 -6 --nolog -c -H server
>
> Before series:
>
> utime_start=0.288582
> utime_end=1.548707
> stime_start=20.637138
> stime_end=2002.489845
> num_transactions=484453
> latency_min=0.156279245
> latency_max=20.922042756
> latency_mean=1.546521274
> latency_stddev=3.936005194
> num_samples=312537
> throughput=47426.00
>
> perf top on the client:
>
> 49.54% [kernel] [k] _raw_spin_lock
> 25.87% [kernel] [k] _raw_spin_lock_bh
> 5.97% [kernel] [k] queued_spin_lock_slowpath
> 5.67% [kernel] [k] __inet_hash_connect
> 3.53% [kernel] [k] __inet6_check_established
> 3.48% [kernel] [k] inet6_ehashfn
> 0.64% [kernel] [k] rcu_all_qs
>
> After this series:
>
> utime_start=0.271607
> utime_end=3.847111
> stime_start=18.407684
> stime_end=1997.485557
> num_transactions=1350742
> latency_min=0.014131929
> latency_max=17.895073144
> latency_mean=0.505675853 # Nice reduction of latency metrics
> latency_stddev=2.125164772
> num_samples=307884
> throughput=139866.80 # 190 % increase
>
> perf top on client:
>
> 56.86% [kernel] [k] __inet6_check_established
> 17.96% [kernel] [k] __inet_hash_connect
> 13.88% [kernel] [k] inet6_ehashfn
> 2.52% [kernel] [k] rcu_all_qs
> 2.01% [kernel] [k] __cond_resched
> 0.41% [kernel] [k] _raw_spin_lock
>
> Signed-off-by: Eric Dumazet <edumazet@...gle.com>
Thanks for the great optimisation!
Reviewed-by: Kuniyuki Iwashima <kuniyu@...zon.com>
Powered by blists - more mailing lists