lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CANn89i+vszCZ9CBJJmatuoF+N9mPhQz8YtNTSdKH8bJRtQdKXw@mail.gmail.com>
Date: Mon, 23 Sep 2024 11:19:44 +0200
From: Eric Dumazet <edumazet@...gle.com>
To: Philo Lu <lulie@...ux.alibaba.com>
Cc: netdev@...r.kernel.org, willemdebruijn.kernel@...il.com, 
	davem@...emloft.net, kuba@...nel.org, pabeni@...hat.com, dsahern@...nel.org, 
	antony.antony@...unet.com, steffen.klassert@...unet.com, 
	linux-kernel@...r.kernel.org, dust.li@...ux.alibaba.com, jakub@...udflare.com, 
	fred.cc@...baba-inc.com, yubing.qiuyubing@...baba-inc.com
Subject: Re: [RFC PATCH net-next] net/udp: Add 4-tuple hash for connected socket

On Mon, Sep 23, 2024 at 10:40 AM Philo Lu <lulie@...ux.alibaba.com> wrote:
>
> Hi Eric, sorry for the late response.
>
> On 2024/9/13 19:49, Eric Dumazet wrote:
> > On Fri, Sep 13, 2024 at 12:09 PM Philo Lu <lulie@...ux.alibaba.com> wrote:
> >>
> >> This RFC patch introduces 4-tuple hash for connected udp sockets, to
> >> make udp lookup faster. It is a tentative proposal and any comment is
> >> welcome.
> >>
> >> Currently, the udp_table has two hash table, the port hash and portaddr
> >> hash. But for UDP server, all sockets have the same local port and addr,
> >> so they are all on the same hash slot within a reuseport group. And the
> >> target sock is selected by scoring.
> >>
> >> In some applications, the UDP server uses connect() for each incoming
> >> client, and then the socket (fd) is used exclusively by the client. In
> >> such scenarios, current scoring method can be ineffcient with a large
> >> number of connections, resulting in high softirq overhead.
> >>
> >> To solve the problem, a 4-tuple hash list is added to udp_table, and is
> >> updated when calling connect(). Then __udp4_lib_lookup() firstly
> >> searches the 4-tuple hash list, and return directly if success. A new
> >> sockopt UDP_HASH4 is added to enable it. So the usage is:
> >> 1. socket()
> >> 2. bind()
> >> 3. setsockopt(UDP_HASH4)
> >> 4. connect()
> >>
> >> AFAICT the patch (if useful) can be further improved by:
> >> (a) Support disable with sockopt UDP_HASH4. Now it cannot be disabled
> >> once turned on until the socket closed.
> >> (b) Better interact with hash2/reuseport. Now hash4 hardly affects other
> >> mechanisms, but maintaining sockets in both hash4 and hash2 lists seems
> >> unnecessary.
> >> (c) Support early demux and ipv6.
> >>
> >> Signed-off-by: Philo Lu <lulie@...ux.alibaba.com>
> >
> > Adding a 4-tuple hash for UDP has been discussed in the past.
> >
> > Main issue is that this is adding one cache line miss per incoming packet.
> >
>
> Thanks to Dust's idea, we can create a new field for hslot2 (or create a
> new struct for hslot2), indicating whether there are connected sockets
> in this hslot (i.e., local port and local address), and run hash4 lookup
> only when it's true. Then there would be no cache line miss.
>
> The detailed patch is attached below.
>
> > Most heavy duty UDP servers (DNS, QUIC), use non connected sockets,
> > because having one million UDP sockets has huge kernel memory cost,
> > not counting poor cache locality.
>
> Some of our applications do use connected UDP sockets (~10,000 conns),
> and will get significant benefits from hash4. We use connect() to
> separate receiving sockets and listening ones, and then it's easier to
> manage them (just like TCP), especially during live-upgrading, such as
> nginx reload. Besides, I believe hash4 is harmless to those servers
> without connected sockets.
>
> Suggestions are always welcome, and I'll keep improving this patch.
>
> Thanks.
>
> ---
>   include/net/udp.h |  3 +++
>   net/ipv4/udp.c    | 17 ++++++++++++-----
>   2 files changed, 15 insertions(+), 5 deletions(-)
>
> diff --git a/include/net/udp.h b/include/net/udp.h
> index a05d79d35fbba..bec04c0e753d0 100644
> --- a/include/net/udp.h
> +++ b/include/net/udp.h
> @@ -54,11 +54,14 @@ struct udp_skb_cb {
>    *
>    *    @head:  head of list of sockets
>    *    @count: number of sockets in 'head' list
> + *     @hash4_cnt: number of sockets in 'hash4' table of the same (local
> port, local address),
> + *                 Only used by hash2.
>    *    @lock:  spinlock protecting changes to head/count
>    */
>   struct udp_hslot {
>         struct hlist_head       head;
>         int                     count;
> +       u32                     hash4_cnt;
>         spinlock_t              lock;
>   } __attribute__((aligned(2 * sizeof(long))));

This would double the size of this structure (from 16 to 32 bytes)

Perhaps add another 'struct udp_hslot_main' for the relevant hash table,
and keep the smaller 'struct udp_hslot' for the two others.

Current cumulative size of the hash tables is 2 MB.

Alternatively we could move the locks out of the structure, this is
not used in the fast path.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ