[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170818205136.nsmlmyshaobsyukc@kafai-mba.dhcp.thefacebook.com>
Date: Fri, 18 Aug 2017 13:51:36 -0700
From: Martin KaFai Lau <kafai@...com>
To: Tom Herbert <tom@...bertland.com>
CC: Shaohua Li <shli@...nel.org>,
Linux Kernel Network Developers <netdev@...r.kernel.org>,
"David S. Miller" <davem@...emloft.net>
Subject: Re: [PATCH V4 net 0/2] ipv6: fix flowlabel issue for reset packet
On Fri, Aug 18, 2017 at 07:50:03AM -0700, Tom Herbert wrote:
> > We had been using the auto_flowlabels=1 (i.e. essentially enable flowlabel)
> > mainly because we want to take the benefit of dst_negative_advice() when
> > tcp_write_timeout() happens.
> >
> > During our test, our system handles quite well with changing flowlabel.
> > The only exception we have hit is the TCP_RST sent from an inet_timewait_sock.
> >
> Martin,
>
> That is interesting data. Have you determined why the middlebox has a
> problem with flow label change in TW state but not other states?
Tom,
The problem of this middle box is specific to TCP_RST with a different
flowlabel from its previous packets. Sending TCP_RST from TW state hits
this pain point.
It seems like that middle box specifically drops TCP_RST if it
does not know anything about this flow. Since the flowlabel of the TCP_RST
(sent in TW state) is always different, it always lands to a different middle
box. All of these TCP_RST cannot be delivered.
We are resilience to a small number of TCP_RST drop. However, this guarantee
flowlabel change on TCP_RST and then dropped is too much.
This flowlabel change does not look like intentional either when transitioning
from full sk to tw sk (tw->tw_flowlabel is inheriting the np->flow_label
in tcp_time_wait()). Currently, the tw_flowlabel is used in tcp_v6_timewait_ack()
but not in tcp_v6_send_reset(). Hence, shaohua is looking for a solution to solve
them together.
Thanks,
Martin
>
> Tom
>
> > If we keep the flowlabel consistent (or persistent sk_txhash), there
> > is no practical usage for us to turn on flowlabel and the problem also goes
> > away. We have it off for now.
> >
> >>
> >> > There seems to have other bug in this side. From my understanding, commit
> >> > 265f94ff54d6(net: Recompute sk_txhash on negative routing advice) tries to
> >> > select a different route. But the multipath selection code
> >> > (rt6_multipath_select) doesn't use sk_txhash or skb->hash, it does use
> >> > fl6.flowlabel, but that is the flowlabel user sets. So looks like the commit
> >> > doesn't change anything.
> >> >
> >> The routing functions typically don't use sock of skbuff, but use flow
> >> structs instead. It may be reasonable to add a hash to those.
> > The localhost's mutlipath selection is another existing issue. AFAICT,
> > it does not take the sk_txhash (or skb->hash) into account and the following
> > dst_negative_advice() will also have no effect in the route selction.
> > It is another issue to be fixed and to be figured out how to pass the
> > sk_txhash down. (1)
> >
> > Shaohua is proposing to record the 20 bits of the sk_txhash in the
> > tw_flowlabel of the 'struct inet_timewait_sock'. The tw_flowlabel could
> > potentially be used to do the multipath selection once we figured out
> > how to tackle (1).
> >
> > Thanks,
> > Martin
> >
> >
> >>
> >> > What's the 'src port for UDP encap'? I can't find the code setting skb->hash
> >> > to sk_txhash in UDP side.
> >> >
> >> udp_flow_src_port is function call by UDP encaps to set source port.
> >> This is call skb_get_hash. sk_set_txhash is function to set txhash
> >> right now to random value. skb_set_hash_from_sk set skb->hash when
> >> skbuff is owned by socket (skb_set_owner_w).
> >>
> >> Thanks,
> >> Tom
> >>
> >>
> >> > Thanks,
> >> > Shaohua
Powered by blists - more mailing lists