[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4b3ede21-27cf-bc17-be71-4c388e670f2c@kernel.org>
Date: Tue, 29 Oct 2024 23:17:40 +0200 (EET)
From: Ilpo Järvinen <ij@...nel.org>
To: Paolo Abeni <pabeni@...hat.com>
cc: chia-yu.chang@...ia-bell-labs.com, netdev@...r.kernel.org,
davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org,
dsahern@...nel.org, netfilter-devel@...r.kernel.org, kadlec@...filter.org,
coreteam@...filter.org, pablo@...filter.org, bpf@...r.kernel.org,
joel.granados@...nel.org, linux-fsdevel@...r.kernel.org, kees@...nel.org,
mcgrof@...nel.org, ncardwell@...gle.com,
koen.de_schepper@...ia-bell-labs.com, g.white@...leLabs.com,
ingemar.s.johansson@...csson.com, mirja.kuehlewind@...csson.com,
cheshire@...le.com, rs.ietf@....at, Jason_Livingood@...cast.com,
vidhi_goel@...le.com
Subject: Re: [PATCH v4 net-next 09/14] gro: prevent ACE field corruption &
better AccECN handling
On Tue, 29 Oct 2024, Paolo Abeni wrote:
> On 10/21/24 23:59, chia-yu.chang@...ia-bell-labs.com wrote:
> > From: Ilpo Järvinen <ij@...nel.org>
> >
> > There are important differences in how the CWR field behaves
> > in RFC3168 and AccECN. With AccECN, CWR flag is part of the
> > ACE counter and its changes are important so adjust the flags
> > changed mask accordingly.
> >
> > Also, if CWR is there, set the Accurate ECN GSO flag to avoid
> > corrupting CWR flag somewhere.
> >
> > Signed-off-by: Ilpo Järvinen <ij@...nel.org>
> > Signed-off-by: Chia-Yu Chang <chia-yu.chang@...ia-bell-labs.com>
> > ---
> > net/ipv4/tcp_offload.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
> > index 0b05f30e9e5f..f59762d88c38 100644
> > --- a/net/ipv4/tcp_offload.c
> > +++ b/net/ipv4/tcp_offload.c
> > @@ -329,7 +329,7 @@ struct sk_buff *tcp_gro_receive(struct list_head *head, struct sk_buff *skb,
> > th2 = tcp_hdr(p);
> > flush = (__force int)(flags & TCP_FLAG_CWR);
> > flush |= (__force int)((flags ^ tcp_flag_word(th2)) &
> > - ~(TCP_FLAG_CWR | TCP_FLAG_FIN | TCP_FLAG_PSH));
> > + ~(TCP_FLAG_FIN | TCP_FLAG_PSH));
>
> If I read correctly, if the peer is using RFC3168 and TSO_ECN, GRO will
> now pump into the stack twice the number of packets it was doing prior
> to this patch, am I correct?
>
> That is likely causing measurable performance regressions.
Hi Paolo,
Thanks for taking a look!
While it's true on surface that this might cause some more packets with
RFC3168 (by design, as network cannot know if the sender is using RFC3168
or not), the important question is the scale how many of extra packets
will occur in practice.
First of all, RFC3168 requires CWR flag to be sent no more frequently
than once per window of data, or in other words, once per RTT. And that
means just one packet, not e.g. all packets of a super-skb (the RFC3168
signalling will lose its integrity if this is violated by the sender).
Secondly, the TCP sender uses CWR flag to indicate it just halved its
congestion window which mean it is sending half the amount of packets in
this window than in the previous window (analoguous to halving sending
rate). 2 RTTs with CWR each means two window reductions (this behavior
is spec'ed in RFC3168).
So lets say the sender was using 100 packets congestion window, this
change will add one packet to 50 packets on this next RTT. Note those are
raw numbers of packets on wire and do not tell how many packets GRO
combined into each super-skb which will wary case-by-case basis.
Regardless, I suspect the extra packet added to the half of the packets
will be hard/impossible to measure to cause a performance regression.
This change would double the number of packets only if the congestion
window is 1 or 2 packets and in that case TSO/GSO/GRO benefits will be
pretty small to begin with (or even counterproductive). Also, the
traditional TCP congestion control (RFC3168 included) has many issues
anyway with that small windows because it doesn't deal with fractional
congestion windows well.
> > flush |= (__force int)(th->ack_seq ^ th2->ack_seq);
> > for (i = sizeof(*th); i < thlen; i += 4)
> > flush |= *(u32 *)((u8 *)th + i) ^
> > @@ -405,7 +405,7 @@ void tcp_gro_complete(struct sk_buff *skb)
> > shinfo->gso_segs = NAPI_GRO_CB(skb)->count;
> >
> > if (th->cwr)
> > - shinfo->gso_type |= SKB_GSO_TCP_ECN;
> > + shinfo->gso_type |= SKB_GSO_TCP_ACCECN;
>
> If this packet is forwarded, it will not leverage TSO anymore - with
> current H/W.
>
> I think we need a way to enable this feature conditionally, but I fear
> another sysctl will be ugly and the additional conditionals will not be
> good for GRO.
>
> Smarter suggestions welcome ;)
Well, it is already very selectively _conditional_, SKB_GSO_TCP_ACCECN is
only set for the skb when CWR is set. That is, once per RTT (data window)
when it comes to RFC3168.
I don't have any source for this (other than reading many many tcpdumps
in the past) but I believe the percentage of packets with CWR set (due to
RFC3168 signalling) is going to be very small overall.
Do you think that is not good enough?
To answer more generally to your suggestion on making it conditional based
on some other logic, it would mean you accept network middleboxes are
allowed to corrupt AccECN ACE field when forwarding. If RFC3168 TSO/GSO
trickery remains in use (without a middlebox explicitly tracking the
connection had negotiated RFC3168), a forwarder won't be able to reproduce
the exactly same stream of TCP packets headers thus corrupting non-RFC3168
use of CWR flag. It's not something any middlebox should be doing (I hope
we agree on this as a general principle)!
--
i.
> Cheers,
>
> Paolo
>
> > }
> > EXPORT_SYMBOL(tcp_gro_complete);
> >
>
>
Powered by blists - more mailing lists