[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f154fcd1d7e9c856c46dbf00ef4998773574a5cc.camel@redhat.com>
Date: Thu, 01 Sep 2022 12:57:33 +0200
From: Paolo Abeni <pabeni@...hat.com>
To: Kuniyuki Iwashima <kuniyu@...zon.com>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>
Cc: Kuniyuki Iwashima <kuni1840@...il.com>, netdev@...r.kernel.org
Subject: Re: [PATCH v3 net-next 3/5] tcp: Access &tcp_hashinfo via net.
On Tue, 2022-08-30 at 12:15 -0700, Kuniyuki Iwashima wrote:
> We will soon introduce an optional per-netns ehash.
>
> This means we cannot use tcp_hashinfo directly in most places.
>
> Instead, access it via net->ipv4.tcp_death_row->hashinfo.
>
> The access will be valid only while initialising tcp_hashinfo
> itself and creating/destroying each netns.
>
> Signed-off-by: Kuniyuki Iwashima <kuniyu@...zon.com>
> ---
> .../chelsio/inline_crypto/chtls/chtls_cm.c | 5 +-
> net/core/filter.c | 5 +-
> net/ipv4/esp4.c | 3 +-
> net/ipv4/inet_hashtables.c | 2 +-
> net/ipv4/netfilter/nf_socket_ipv4.c | 2 +-
> net/ipv4/netfilter/nf_tproxy_ipv4.c | 17 +++--
> net/ipv4/tcp_diag.c | 18 +++++-
> net/ipv4/tcp_ipv4.c | 63 +++++++++++--------
> net/ipv4/tcp_minisocks.c | 2 +-
> net/ipv6/esp6.c | 3 +-
> net/ipv6/inet6_hashtables.c | 4 +-
> net/ipv6/netfilter/nf_socket_ipv6.c | 2 +-
> net/ipv6/netfilter/nf_tproxy_ipv6.c | 5 +-
> net/ipv6/tcp_ipv6.c | 16 ++---
> net/mptcp/mptcp_diag.c | 7 ++-
> 15 files changed, 92 insertions(+), 62 deletions(-)
>
> diff --git a/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_cm.c b/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_cm.c
> index ddfe9208529a..f90bfba4b303 100644
> --- a/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_cm.c
> +++ b/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_cm.c
> @@ -1069,8 +1069,7 @@ static void chtls_pass_accept_rpl(struct sk_buff *skb,
> cxgb4_l2t_send(csk->egress_dev, skb, csk->l2t_entry);
> }
>
> -static void inet_inherit_port(struct inet_hashinfo *hash_info,
> - struct sock *lsk, struct sock *newsk)
> +static void inet_inherit_port(struct sock *lsk, struct sock *newsk)
> {
> local_bh_disable();
> __inet_inherit_port(lsk, newsk);
> @@ -1240,7 +1239,7 @@ static struct sock *chtls_recv_sock(struct sock *lsk,
> ipv4.sysctl_tcp_window_scaling),
> tp->window_clamp);
> neigh_release(n);
> - inet_inherit_port(&tcp_hashinfo, lsk, newsk);
> + inet_inherit_port(lsk, newsk);
> csk_set_flag(csk, CSK_CONN_INLINE);
> bh_unlock_sock(newsk); /* tcp_create_openreq_child ->sk_clone_lock */
I looks to me that the above chunks are functionally a no-op and I
think that omitting the 2 drivers from the v2:
https://lore.kernel.org/netdev/20220829161920.99409-4-kuniyu@amazon.com/
should break mlx5/nfp inside a netns. I don't understand why including
the above and skipping the latters?!? I guess is a question mostly for
Eric :)
> @@ -1728,6 +1728,7 @@ EXPORT_SYMBOL(tcp_v4_do_rcv);
>
> int tcp_v4_early_demux(struct sk_buff *skb)
> {
> + struct net *net = dev_net(skb->dev);
> const struct iphdr *iph;
> const struct tcphdr *th;
> struct sock *sk;
> @@ -1744,7 +1745,7 @@ int tcp_v4_early_demux(struct sk_buff *skb)
> if (th->doff < sizeof(struct tcphdr) / 4)
> return 0;
>
> - sk = __inet_lookup_established(dev_net(skb->dev), &tcp_hashinfo,
> + sk = __inet_lookup_established(net, net->ipv4.tcp_death_row->hashinfo,
> iph->saddr, th->source,
> iph->daddr, ntohs(th->dest),
> skb->skb_iif, inet_sdif(skb));
/Me is thinking aloud...
I'm wondering if the above has some measurable negative effect for
large deployments using only the main netns?
Specifically, are net->ipv4.tcp_death_row and net->ipv4.tcp_death_row-
>hashinfo already into the working set data for established socket?
Would the above increase the WSS by 2 cache-lines?
Thanks!
Paolo
Powered by blists - more mailing lists