lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 01 Sep 2022 12:57:33 +0200
From:   Paolo Abeni <pabeni@...hat.com>
To:     Kuniyuki Iwashima <kuniyu@...zon.com>,
        "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>
Cc:     Kuniyuki Iwashima <kuni1840@...il.com>, netdev@...r.kernel.org
Subject: Re: [PATCH v3 net-next 3/5] tcp: Access &tcp_hashinfo via net.

On Tue, 2022-08-30 at 12:15 -0700, Kuniyuki Iwashima wrote:
> We will soon introduce an optional per-netns ehash.
> 
> This means we cannot use tcp_hashinfo directly in most places.
> 
> Instead, access it via net->ipv4.tcp_death_row->hashinfo.
> 
> The access will be valid only while initialising tcp_hashinfo
> itself and creating/destroying each netns.
> 
> Signed-off-by: Kuniyuki Iwashima <kuniyu@...zon.com>
> ---
>  .../chelsio/inline_crypto/chtls/chtls_cm.c    |  5 +-
>  net/core/filter.c                             |  5 +-
>  net/ipv4/esp4.c                               |  3 +-
>  net/ipv4/inet_hashtables.c                    |  2 +-
>  net/ipv4/netfilter/nf_socket_ipv4.c           |  2 +-
>  net/ipv4/netfilter/nf_tproxy_ipv4.c           | 17 +++--
>  net/ipv4/tcp_diag.c                           | 18 +++++-
>  net/ipv4/tcp_ipv4.c                           | 63 +++++++++++--------
>  net/ipv4/tcp_minisocks.c                      |  2 +-
>  net/ipv6/esp6.c                               |  3 +-
>  net/ipv6/inet6_hashtables.c                   |  4 +-
>  net/ipv6/netfilter/nf_socket_ipv6.c           |  2 +-
>  net/ipv6/netfilter/nf_tproxy_ipv6.c           |  5 +-
>  net/ipv6/tcp_ipv6.c                           | 16 ++---
>  net/mptcp/mptcp_diag.c                        |  7 ++-
>  15 files changed, 92 insertions(+), 62 deletions(-)
> 
> diff --git a/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_cm.c b/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_cm.c
> index ddfe9208529a..f90bfba4b303 100644
> --- a/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_cm.c
> +++ b/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_cm.c
> @@ -1069,8 +1069,7 @@ static void chtls_pass_accept_rpl(struct sk_buff *skb,
>  	cxgb4_l2t_send(csk->egress_dev, skb, csk->l2t_entry);
>  }
>  
> -static void inet_inherit_port(struct inet_hashinfo *hash_info,
> -			      struct sock *lsk, struct sock *newsk)
> +static void inet_inherit_port(struct sock *lsk, struct sock *newsk)
>  {
>  	local_bh_disable();
>  	__inet_inherit_port(lsk, newsk);
> @@ -1240,7 +1239,7 @@ static struct sock *chtls_recv_sock(struct sock *lsk,
>  						     ipv4.sysctl_tcp_window_scaling),
>  					   tp->window_clamp);
>  	neigh_release(n);
> -	inet_inherit_port(&tcp_hashinfo, lsk, newsk);
> +	inet_inherit_port(lsk, newsk);
>  	csk_set_flag(csk, CSK_CONN_INLINE);
>  	bh_unlock_sock(newsk); /* tcp_create_openreq_child ->sk_clone_lock */

I looks to me that the above chunks are functionally a no-op and I
think that omitting the 2 drivers from the v2:

https://lore.kernel.org/netdev/20220829161920.99409-4-kuniyu@amazon.com/

should break mlx5/nfp inside a netns. I don't understand why including
the above and skipping the latters?!? I guess is a question mostly for
Eric :)

> @@ -1728,6 +1728,7 @@ EXPORT_SYMBOL(tcp_v4_do_rcv);
>  
>  int tcp_v4_early_demux(struct sk_buff *skb)
>  {
> +	struct net *net = dev_net(skb->dev);
>  	const struct iphdr *iph;
>  	const struct tcphdr *th;
>  	struct sock *sk;
> @@ -1744,7 +1745,7 @@ int tcp_v4_early_demux(struct sk_buff *skb)
>  	if (th->doff < sizeof(struct tcphdr) / 4)
>  		return 0;
>  
> -	sk = __inet_lookup_established(dev_net(skb->dev), &tcp_hashinfo,
> +	sk = __inet_lookup_established(net, net->ipv4.tcp_death_row->hashinfo,
>  				       iph->saddr, th->source,
>  				       iph->daddr, ntohs(th->dest),
>  				       skb->skb_iif, inet_sdif(skb));

/Me is thinking aloud...

I'm wondering if the above has some measurable negative effect for
large deployments using only the main netns?

Specifically, are net->ipv4.tcp_death_row and net->ipv4.tcp_death_row-
>hashinfo already into the working set data for established socket?
Would the above increase the WSS by 2 cache-lines?

Thanks!

Paolo

Powered by blists - more mailing lists