[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEA6p_BTAit9Y2h-9XaTTBdV6h6X4g0Ct5mOy1ZHfJiLD3y_Ww@mail.gmail.com>
Date: Thu, 19 Nov 2020 15:09:48 -0800
From: Wei Wang <weiwan@...gle.com>
To: Alexander Duyck <alexander.duyck@...il.com>
Cc: Linux Kernel Network Developers <netdev@...r.kernel.org>,
bpf@...r.kernel.org, daniel@...earbox.net,
Martin KaFai Lau <kafai@...com>, kernel-team@...com,
Eric Dumazet <edumazet@...gle.com>, brakmo@...com,
alexanderduyck@...com
Subject: Re: [net PATCH 1/2] tcp: Allow full IP tos/IPv6 tclass to be
reflected in L3 header
On Thu, Nov 19, 2020 at 1:23 PM Alexander Duyck
<alexander.duyck@...il.com> wrote:
>
> From: Alexander Duyck <alexanderduyck@...com>
>
> An issue was recently found where DCTCP SYN/ACK packets did not have the
> ECT bit set in the L3 header. A bit of code review found that the recent
> change referenced below had gone though and added a mask that prevented the
> ECN bits from being populated in the L3 header.
>
> This patch addresses that by rolling back the mask so that it is only
> applied to the flags coming from the incoming TCP request instead of
> applying it to the socket tos/tclass field. Doing this the ECT bits were
> restored in the SYN/ACK packets in my testing.
>
> One thing that is not addressed by this patch set is the fact that
> tcp_reflect_tos appears to be incompatible with ECN based congestion
> avoidance algorithms. At a minimum the feature should likely be documented
> which it currently isn't.
>
> Fixes: ac8f1710c12b ("tcp: reflect tos value received in SYN to the socket")
Acked-by: Wei Wang <weiwan@...gle.com>
Thanks for catching this. I was under the wrong impression that the
ECT bits were marked in tos after the tcp layer. Upon a closer look,
it seems right now, it only gets marked in inet_sock(sk)->tos from
tcp_init_congestion_control() once.
I will submit a follow-up fix to make sure we include the lower 2 bits
in the reflection case as well.
> Signed-off-by: Alexander Duyck <alexanderduyck@...com>
> ---
> net/ipv4/tcp_ipv4.c | 5 +++--
> net/ipv6/tcp_ipv6.c | 6 +++---
> 2 files changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index c2d5132c523c..c5f8b686aa82 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -981,7 +981,8 @@ static int tcp_v4_send_synack(const struct sock *sk, struct dst_entry *dst,
> skb = tcp_make_synack(sk, dst, req, foc, synack_type, syn_skb);
>
> tos = sock_net(sk)->ipv4.sysctl_tcp_reflect_tos ?
> - tcp_rsk(req)->syn_tos : inet_sk(sk)->tos;
> + tcp_rsk(req)->syn_tos & ~INET_ECN_MASK :
> + inet_sk(sk)->tos;
>
> if (skb) {
> __tcp_v4_send_check(skb, ireq->ir_loc_addr, ireq->ir_rmt_addr);
> @@ -990,7 +991,7 @@ static int tcp_v4_send_synack(const struct sock *sk, struct dst_entry *dst,
> err = ip_build_and_send_pkt(skb, sk, ireq->ir_loc_addr,
> ireq->ir_rmt_addr,
> rcu_dereference(ireq->ireq_opt),
> - tos & ~INET_ECN_MASK);
> + tos);
> rcu_read_unlock();
> err = net_xmit_eval(err);
> }
> diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
> index 8db59f4e5f13..3d49e8d0afee 100644
> --- a/net/ipv6/tcp_ipv6.c
> +++ b/net/ipv6/tcp_ipv6.c
> @@ -530,12 +530,12 @@ static int tcp_v6_send_synack(const struct sock *sk, struct dst_entry *dst,
> rcu_read_lock();
> opt = ireq->ipv6_opt;
> tclass = sock_net(sk)->ipv4.sysctl_tcp_reflect_tos ?
> - tcp_rsk(req)->syn_tos : np->tclass;
> + tcp_rsk(req)->syn_tos & ~INET_ECN_MASK :
> + np->tclass;
> if (!opt)
> opt = rcu_dereference(np->opt);
> err = ip6_xmit(sk, skb, fl6, sk->sk_mark, opt,
> - tclass & ~INET_ECN_MASK,
> - sk->sk_priority);
> + tclass, sk->sk_priority);
> rcu_read_unlock();
> err = net_xmit_eval(err);
> }
>
>
Powered by blists - more mailing lists