lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <dd4ff273-2227-4e5a-ba11-2ca79035b811@linux.alibaba.com>
Date: Mon, 23 Sep 2024 16:40:03 +0800
From: Philo Lu <lulie@...ux.alibaba.com>
To: Eric Dumazet <edumazet@...gle.com>
Cc: netdev@...r.kernel.org, willemdebruijn.kernel@...il.com,
 davem@...emloft.net, kuba@...nel.org, pabeni@...hat.com, dsahern@...nel.org,
 antony.antony@...unet.com, steffen.klassert@...unet.com,
 linux-kernel@...r.kernel.org, dust.li@...ux.alibaba.com,
 jakub@...udflare.com, fred.cc@...baba-inc.com,
 yubing.qiuyubing@...baba-inc.com
Subject: Re: [RFC PATCH net-next] net/udp: Add 4-tuple hash for connected
 socket

Hi Eric, sorry for the late response.

On 2024/9/13 19:49, Eric Dumazet wrote:
> On Fri, Sep 13, 2024 at 12:09 PM Philo Lu <lulie@...ux.alibaba.com> wrote:
>>
>> This RFC patch introduces 4-tuple hash for connected udp sockets, to
>> make udp lookup faster. It is a tentative proposal and any comment is
>> welcome.
>>
>> Currently, the udp_table has two hash table, the port hash and portaddr
>> hash. But for UDP server, all sockets have the same local port and addr,
>> so they are all on the same hash slot within a reuseport group. And the
>> target sock is selected by scoring.
>>
>> In some applications, the UDP server uses connect() for each incoming
>> client, and then the socket (fd) is used exclusively by the client. In
>> such scenarios, current scoring method can be ineffcient with a large
>> number of connections, resulting in high softirq overhead.
>>
>> To solve the problem, a 4-tuple hash list is added to udp_table, and is
>> updated when calling connect(). Then __udp4_lib_lookup() firstly
>> searches the 4-tuple hash list, and return directly if success. A new
>> sockopt UDP_HASH4 is added to enable it. So the usage is:
>> 1. socket()
>> 2. bind()
>> 3. setsockopt(UDP_HASH4)
>> 4. connect()
>>
>> AFAICT the patch (if useful) can be further improved by:
>> (a) Support disable with sockopt UDP_HASH4. Now it cannot be disabled
>> once turned on until the socket closed.
>> (b) Better interact with hash2/reuseport. Now hash4 hardly affects other
>> mechanisms, but maintaining sockets in both hash4 and hash2 lists seems
>> unnecessary.
>> (c) Support early demux and ipv6.
>>
>> Signed-off-by: Philo Lu <lulie@...ux.alibaba.com>
> 
> Adding a 4-tuple hash for UDP has been discussed in the past.
> 
> Main issue is that this is adding one cache line miss per incoming packet.
> 

Thanks to Dust's idea, we can create a new field for hslot2 (or create a 
new struct for hslot2), indicating whether there are connected sockets 
in this hslot (i.e., local port and local address), and run hash4 lookup 
only when it's true. Then there would be no cache line miss.

The detailed patch is attached below.

> Most heavy duty UDP servers (DNS, QUIC), use non connected sockets,
> because having one million UDP sockets has huge kernel memory cost,
> not counting poor cache locality.

Some of our applications do use connected UDP sockets (~10,000 conns), 
and will get significant benefits from hash4. We use connect() to 
separate receiving sockets and listening ones, and then it's easier to 
manage them (just like TCP), especially during live-upgrading, such as 
nginx reload. Besides, I believe hash4 is harmless to those servers 
without connected sockets.

Suggestions are always welcome, and I'll keep improving this patch.

Thanks.

---
  include/net/udp.h |  3 +++
  net/ipv4/udp.c    | 17 ++++++++++++-----
  2 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/include/net/udp.h b/include/net/udp.h
index a05d79d35fbba..bec04c0e753d0 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -54,11 +54,14 @@ struct udp_skb_cb {
   *
   *	@head:	head of list of sockets
   *	@count:	number of sockets in 'head' list
+ *	@hash4_cnt: number of sockets in 'hash4' table of the same (local 
port, local address),
+ *		    Only used by hash2.
   *	@lock:	spinlock protecting changes to head/count
   */
  struct udp_hslot {
  	struct hlist_head	head;
  	int			count;
+	u32			hash4_cnt;
  	spinlock_t		lock;
  } __attribute__((aligned(2 * sizeof(long))));

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index aac0251ff6fac..dfa8b3c091def 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -511,14 +511,16 @@ struct sock *__udp4_lib_lookup(const struct net 
*net, __be32 saddr,
  	struct udp_hslot *hslot2;
  	struct sock *result, *sk;

-	result = udp4_lib_lookup4(net, saddr, sport, daddr, hnum, dif, sdif, 
udptable);
-	if (result)
-		return result;
-
  	hash2 = ipv4_portaddr_hash(net, daddr, hnum);
  	slot2 = hash2 & udptable->mask;
  	hslot2 = &udptable->hash2[slot2];
  ·
+	if (hslot2->hash4_cnt) {
+		result = udp4_lib_lookup4(net, saddr, sport, daddr, hnum, dif, sdif, 
udptable);
+		if (result)
+			return result;
+	}
+
  	/* Lookup connected or non-wildcard socket */
  	result = udp4_lib_lookup2(net, saddr, sport,
  				  daddr, hnum, dif, sdif,
@@ -1961,7 +1963,7 @@ EXPORT_SYMBOL(udp_pre_connect);
  /* call with sock lock */
  static void udp4_hash4(struct sock *sk)
  {
-	struct udp_hslot *hslot, *hslot4;
+	struct udp_hslot *hslot, *hslot2, *hslot4;
  	struct net *net = sock_net(sk);
  	struct udp_table *udptable;
  	unsigned int hash;
@@ -1975,6 +1977,7 @@ static void udp4_hash4(struct sock *sk)

  	udptable = net->ipv4.udp_table;
  	hslot = udp_hashslot(udptable, net, udp_sk(sk)->udp_port_hash);
+	hslot2 = udp_hashslot2(udptable, udp_sk(sk)->udp_portaddr_hash);
  	hslot4 = udp_hashslot4(udptable, hash);
  	udp_sk(sk)->udp_lrpa_hash = hash;

@@ -1985,6 +1988,7 @@ static void udp4_hash4(struct sock *sk)
  	spin_lock(&hslot4->lock);
  	hlist_add_head_rcu(&udp_sk(sk)->udp_lrpa_node, &hslot4->head);
  	hslot4->count++;
+	hslot2->hash4_cnt++;
  	spin_unlock(&hslot4->lock);

  	spin_unlock_bh(&hslot->lock);
@@ -2068,6 +2072,7 @@ void udp_lib_unhash(struct sock *sk)
  				spin_lock(&hslot4->lock);
  				hlist_del_init_rcu(&udp_sk(sk)->udp_lrpa_node);
  				hslot4->count--;
+				hslot2->hash4_cnt--;
  				spin_unlock(&hslot4->lock);
  			}
  		}
@@ -2119,11 +2124,13 @@ void udp_lib_rehash(struct sock *sk, u16 
newhash, u16 newhash4)
  				spin_lock(&hslot4->lock);
  				hlist_del_init_rcu(&udp_sk(sk)->udp_lrpa_node);
  				hslot4->count--;
+				hslot2->hash4_cnt--;
  				spin_unlock(&hslot4->lock);

  				spin_lock(&nhslot4->lock);
  				hlist_add_head_rcu(&udp_sk(sk)->udp_lrpa_node, &nhslot4->head);
  				nhslot4->count++;
+				nhslot2->hash4_cnt++;
  				spin_unlock(&nhslot4->lock);
  			}

-- 
Philo


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ