[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <dd4ff273-2227-4e5a-ba11-2ca79035b811@linux.alibaba.com>
Date: Mon, 23 Sep 2024 16:40:03 +0800
From: Philo Lu <lulie@...ux.alibaba.com>
To: Eric Dumazet <edumazet@...gle.com>
Cc: netdev@...r.kernel.org, willemdebruijn.kernel@...il.com,
davem@...emloft.net, kuba@...nel.org, pabeni@...hat.com, dsahern@...nel.org,
antony.antony@...unet.com, steffen.klassert@...unet.com,
linux-kernel@...r.kernel.org, dust.li@...ux.alibaba.com,
jakub@...udflare.com, fred.cc@...baba-inc.com,
yubing.qiuyubing@...baba-inc.com
Subject: Re: [RFC PATCH net-next] net/udp: Add 4-tuple hash for connected
socket
Hi Eric, sorry for the late response.
On 2024/9/13 19:49, Eric Dumazet wrote:
> On Fri, Sep 13, 2024 at 12:09 PM Philo Lu <lulie@...ux.alibaba.com> wrote:
>>
>> This RFC patch introduces 4-tuple hash for connected udp sockets, to
>> make udp lookup faster. It is a tentative proposal and any comment is
>> welcome.
>>
>> Currently, the udp_table has two hash table, the port hash and portaddr
>> hash. But for UDP server, all sockets have the same local port and addr,
>> so they are all on the same hash slot within a reuseport group. And the
>> target sock is selected by scoring.
>>
>> In some applications, the UDP server uses connect() for each incoming
>> client, and then the socket (fd) is used exclusively by the client. In
>> such scenarios, current scoring method can be ineffcient with a large
>> number of connections, resulting in high softirq overhead.
>>
>> To solve the problem, a 4-tuple hash list is added to udp_table, and is
>> updated when calling connect(). Then __udp4_lib_lookup() firstly
>> searches the 4-tuple hash list, and return directly if success. A new
>> sockopt UDP_HASH4 is added to enable it. So the usage is:
>> 1. socket()
>> 2. bind()
>> 3. setsockopt(UDP_HASH4)
>> 4. connect()
>>
>> AFAICT the patch (if useful) can be further improved by:
>> (a) Support disable with sockopt UDP_HASH4. Now it cannot be disabled
>> once turned on until the socket closed.
>> (b) Better interact with hash2/reuseport. Now hash4 hardly affects other
>> mechanisms, but maintaining sockets in both hash4 and hash2 lists seems
>> unnecessary.
>> (c) Support early demux and ipv6.
>>
>> Signed-off-by: Philo Lu <lulie@...ux.alibaba.com>
>
> Adding a 4-tuple hash for UDP has been discussed in the past.
>
> Main issue is that this is adding one cache line miss per incoming packet.
>
Thanks to Dust's idea, we can create a new field for hslot2 (or create a
new struct for hslot2), indicating whether there are connected sockets
in this hslot (i.e., local port and local address), and run hash4 lookup
only when it's true. Then there would be no cache line miss.
The detailed patch is attached below.
> Most heavy duty UDP servers (DNS, QUIC), use non connected sockets,
> because having one million UDP sockets has huge kernel memory cost,
> not counting poor cache locality.
Some of our applications do use connected UDP sockets (~10,000 conns),
and will get significant benefits from hash4. We use connect() to
separate receiving sockets and listening ones, and then it's easier to
manage them (just like TCP), especially during live-upgrading, such as
nginx reload. Besides, I believe hash4 is harmless to those servers
without connected sockets.
Suggestions are always welcome, and I'll keep improving this patch.
Thanks.
---
include/net/udp.h | 3 +++
net/ipv4/udp.c | 17 ++++++++++++-----
2 files changed, 15 insertions(+), 5 deletions(-)
diff --git a/include/net/udp.h b/include/net/udp.h
index a05d79d35fbba..bec04c0e753d0 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -54,11 +54,14 @@ struct udp_skb_cb {
*
* @head: head of list of sockets
* @count: number of sockets in 'head' list
+ * @hash4_cnt: number of sockets in 'hash4' table of the same (local
port, local address),
+ * Only used by hash2.
* @lock: spinlock protecting changes to head/count
*/
struct udp_hslot {
struct hlist_head head;
int count;
+ u32 hash4_cnt;
spinlock_t lock;
} __attribute__((aligned(2 * sizeof(long))));
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index aac0251ff6fac..dfa8b3c091def 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -511,14 +511,16 @@ struct sock *__udp4_lib_lookup(const struct net
*net, __be32 saddr,
struct udp_hslot *hslot2;
struct sock *result, *sk;
- result = udp4_lib_lookup4(net, saddr, sport, daddr, hnum, dif, sdif,
udptable);
- if (result)
- return result;
-
hash2 = ipv4_portaddr_hash(net, daddr, hnum);
slot2 = hash2 & udptable->mask;
hslot2 = &udptable->hash2[slot2];
·
+ if (hslot2->hash4_cnt) {
+ result = udp4_lib_lookup4(net, saddr, sport, daddr, hnum, dif, sdif,
udptable);
+ if (result)
+ return result;
+ }
+
/* Lookup connected or non-wildcard socket */
result = udp4_lib_lookup2(net, saddr, sport,
daddr, hnum, dif, sdif,
@@ -1961,7 +1963,7 @@ EXPORT_SYMBOL(udp_pre_connect);
/* call with sock lock */
static void udp4_hash4(struct sock *sk)
{
- struct udp_hslot *hslot, *hslot4;
+ struct udp_hslot *hslot, *hslot2, *hslot4;
struct net *net = sock_net(sk);
struct udp_table *udptable;
unsigned int hash;
@@ -1975,6 +1977,7 @@ static void udp4_hash4(struct sock *sk)
udptable = net->ipv4.udp_table;
hslot = udp_hashslot(udptable, net, udp_sk(sk)->udp_port_hash);
+ hslot2 = udp_hashslot2(udptable, udp_sk(sk)->udp_portaddr_hash);
hslot4 = udp_hashslot4(udptable, hash);
udp_sk(sk)->udp_lrpa_hash = hash;
@@ -1985,6 +1988,7 @@ static void udp4_hash4(struct sock *sk)
spin_lock(&hslot4->lock);
hlist_add_head_rcu(&udp_sk(sk)->udp_lrpa_node, &hslot4->head);
hslot4->count++;
+ hslot2->hash4_cnt++;
spin_unlock(&hslot4->lock);
spin_unlock_bh(&hslot->lock);
@@ -2068,6 +2072,7 @@ void udp_lib_unhash(struct sock *sk)
spin_lock(&hslot4->lock);
hlist_del_init_rcu(&udp_sk(sk)->udp_lrpa_node);
hslot4->count--;
+ hslot2->hash4_cnt--;
spin_unlock(&hslot4->lock);
}
}
@@ -2119,11 +2124,13 @@ void udp_lib_rehash(struct sock *sk, u16
newhash, u16 newhash4)
spin_lock(&hslot4->lock);
hlist_del_init_rcu(&udp_sk(sk)->udp_lrpa_node);
hslot4->count--;
+ hslot2->hash4_cnt--;
spin_unlock(&hslot4->lock);
spin_lock(&nhslot4->lock);
hlist_add_head_rcu(&udp_sk(sk)->udp_lrpa_node, &nhslot4->head);
nhslot4->count++;
+ nhslot2->hash4_cnt++;
spin_unlock(&nhslot4->lock);
}
--
Philo
Powered by blists - more mailing lists