lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 22 Dec 2022 16:05:56 +0100
From:   Paolo Abeni <pabeni@...hat.com>
To:     Kuniyuki Iwashima <kuniyu@...zon.com>,
        "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>
Cc:     Jiri Slaby <jirislaby@...nel.org>,
        Joanne Koong <joannelkoong@...il.com>,
        Kuniyuki Iwashima <kuni1840@...il.com>, netdev@...r.kernel.org
Subject: Re: [PATCH RFC net 1/2] tcp: Add TIME_WAIT sockets in bhash2.

On Thu, 2022-12-22 at 00:12 +0900, Kuniyuki Iwashima wrote:
> Jiri Slaby reported regression of bind() with a simple repro. [0]
> 
> The repro creates a TIME_WAIT socket and tries to bind() a new socket
> with the same local address and port.  Before commit 28044fc1d495 ("net:
> Add a bhash2 table hashed by port and address"), the bind() failed with
> -EADDRINUSE, but now it succeeds.
> 
> The cited commit should have put TIME_WAIT sockets into bhash2; otherwise,
> inet_bhash2_conflict() misses TIME_WAIT sockets when validating bind()
> requests if the address is not a wildcard one.

How does keeping the timewait sockets inside bhash2 affect the bind
loopup performance? I fear that could defeat completely the goal of
28044fc1d495, on quite busy server we could have quite a bit of tw with
the same address/port. If so, we could even consider reverting
28044fc1d495.

> [0]: https://lore.kernel.org/netdev/6b971a4e-c7d8-411e-1f92-fda29b5b2fb9@kernel.org/
> 
> Fixes: 28044fc1d495 ("net: Add a bhash2 table hashed by port and address")
> Reported-by: Jiri Slaby <jirislaby@...nel.org>
> Signed-off-by: Kuniyuki Iwashima <kuniyu@...zon.com>
> ---
>  include/net/inet_timewait_sock.h |  2 ++
>  include/net/sock.h               |  5 +++--
>  net/ipv4/inet_hashtables.c       |  5 +++--
>  net/ipv4/inet_timewait_sock.c    | 31 +++++++++++++++++++++++++++++--
>  4 files changed, 37 insertions(+), 6 deletions(-)
> 
> diff --git a/include/net/inet_timewait_sock.h b/include/net/inet_timewait_sock.h
> index 5b47545f22d3..c46ed239ad9a 100644
> --- a/include/net/inet_timewait_sock.h
> +++ b/include/net/inet_timewait_sock.h
> @@ -44,6 +44,7 @@ struct inet_timewait_sock {
>  #define tw_bound_dev_if		__tw_common.skc_bound_dev_if
>  #define tw_node			__tw_common.skc_nulls_node
>  #define tw_bind_node		__tw_common.skc_bind_node
> +#define tw_bind2_node		__tw_common.skc_bind2_node
>  #define tw_refcnt		__tw_common.skc_refcnt
>  #define tw_hash			__tw_common.skc_hash
>  #define tw_prot			__tw_common.skc_prot
> @@ -73,6 +74,7 @@ struct inet_timewait_sock {
>  	u32			tw_priority;
>  	struct timer_list	tw_timer;
>  	struct inet_bind_bucket	*tw_tb;
> +	struct inet_bind2_bucket	*tw_tb2;
>  };
>  #define tw_tclass tw_tos
>  
> diff --git a/include/net/sock.h b/include/net/sock.h
> index dcd72e6285b2..aaec985c1b5b 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -156,6 +156,7 @@ typedef __u64 __bitwise __addrpair;
>   *	@skc_tw_rcv_nxt: (aka tw_rcv_nxt) TCP window next expected seq number
>   *		[union with @skc_incoming_cpu]
>   *	@skc_refcnt: reference count
> + *	@skc_bind2_node: bind node in the bhash2 table
>   *
>   *	This is the minimal network layer representation of sockets, the header
>   *	for struct sock and struct inet_timewait_sock.
> @@ -241,6 +242,7 @@ struct sock_common {
>  		u32		skc_window_clamp;
>  		u32		skc_tw_snd_nxt; /* struct tcp_timewait_sock */
>  	};
> +	struct hlist_node	skc_bind2_node;

I *think* it would be better adding a tw_bind2_node field to the
inet_timewait_sock struct, so that we leave unmodified the request
socket and we don't change the struct sock binary layout. That could
affect performances moving hot fields on different cachelines.


Thanks,

Paolo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ