[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20220712235310.1935121-1-joannelkoong@gmail.com>
Date: Tue, 12 Jul 2022 16:53:07 -0700
From: Joanne Koong <joannelkoong@...il.com>
To: netdev@...r.kernel.org
Cc: edumazet@...gle.com, kafai@...com, kuba@...nel.org,
davem@...emloft.net, pabeni@...hat.com,
Joanne Koong <joannelkoong@...il.com>
Subject: [PATCH net-next v2 0/3] Add a second bind table hashed by port + address
Currently, there is one bind hashtable (bhash) that hashes by port only.
This patchset adds a second bind table (bhash2) that hashes by port and address.
The motivation for adding bhash2 is to expedite bind requests in situations where
the port has many sockets in its bhash table entry (eg a large number of
sockets bound to different addresses on the same port), which makes checking bind
conflicts costly especially given that we acquire the table entry spinlock
while doing so, which can cause softirq cpu lockups and can prevent new tcp
connections.
We ran into this problem at Meta where the traffic team binds a large number
of IPs to port 443 and the bind() call took a significant amount of time
which led to cpu softirq lockups, which caused packet drops and other failures
on the machine
When experimentally testing this on a local server for ~24k sockets bound to
the port, the results seen were:
ipv4:
before - 0.002317 seconds
with bhash2 - 0.000020 seconds
ipv6:
before - 0.002431 seconds
with bhash2 - 0.000021 seconds
The additions to the initial bhash2 submission [1] are:
* Updating bhash2 in the cases where a socket's rcv saddr changes after it has
* been bound
* Adding locks for bhash2 hashbuckets
[1] https://lore.kernel.org/netdev/20220520001834.2247810-1-kuba@kernel.org/
---
Changelog
v1 -> v2
v1:
https://lore.kernel.org/netdev/20220623234242.2083895-2-joannelkoong@gmail.com/
* Drop formatting change to sk_add_bind_node()
Joanne Koong (3):
net: Add a bhash2 table hashed by port + address
selftests/net: Add test for timing a bind request to a port with a
populated bhash entry
selftests/net: Add sk_bind_sendto_listen test
include/net/inet_connection_sock.h | 3 +
include/net/inet_hashtables.h | 80 ++++-
include/net/sock.h | 14 +
net/dccp/ipv4.c | 24 +-
net/dccp/ipv6.c | 12 +
net/dccp/proto.c | 34 ++-
net/ipv4/af_inet.c | 31 +-
net/ipv4/inet_connection_sock.c | 279 ++++++++++++++----
net/ipv4/inet_hashtables.c | 277 +++++++++++++++--
net/ipv4/tcp.c | 11 +-
net/ipv4/tcp_ipv4.c | 21 +-
net/ipv6/tcp_ipv6.c | 12 +
tools/testing/selftests/net/.gitignore | 4 +-
tools/testing/selftests/net/Makefile | 4 +
tools/testing/selftests/net/bind_bhash.c | 119 ++++++++
tools/testing/selftests/net/bind_bhash.sh | 23 ++
.../selftests/net/sk_bind_sendto_listen.c | 80 +++++
17 files changed, 924 insertions(+), 104 deletions(-)
create mode 100644 tools/testing/selftests/net/bind_bhash.c
create mode 100755 tools/testing/selftests/net/bind_bhash.sh
create mode 100644 tools/testing/selftests/net/sk_bind_sendto_listen.c
--
2.30.2
Powered by blists - more mailing lists