[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <dda364c6-3ac8-31a8-23b5-c337042b7d5d@gmail.com>
Date: Sun, 19 Jul 2020 12:43:55 -0600
From: David Ahern <dsahern@...il.com>
To: Stefano Brivio <sbrivio@...hat.com>
Cc: Florian Westphal <fw@...len.de>, netdev@...r.kernel.org,
aconole@...hat.com
Subject: Re: [PATCH net-next 1/3] udp_tunnel: allow to turn off path mtu
discovery on encap sockets
On 7/18/20 11:58 AM, Stefano Brivio wrote:
> On Sat, 18 Jul 2020 11:02:46 -0600
> David Ahern <dsahern@...il.com> wrote:
>
>> On 7/18/20 12:56 AM, Stefano Brivio wrote:
>>> On Fri, 17 Jul 2020 09:04:51 -0600
>>> David Ahern <dsahern@...il.com> wrote:
>>>
>>>> On 7/17/20 6:27 AM, Stefano Brivio wrote:
>>>>>>
>>>>>>> Note that this doesn't work as it is because of a number of reasons
>>>>>>> (skb doesn't have a dst, pkt_type is not PACKET_HOST), and perhaps we
>>>>>>> shouldn't be using icmp_send(), but at a glance that looks simpler.
>>>>>>
>>>>>> Yes, it also requires that the bridge has IP connectivity
>>>>>> to reach the inner ip, which might not be the case.
>>>>>
>>>>> If the VXLAN endpoint is a port of the bridge, that needs to be the
>>>>> case, right? Otherwise the VXLAN endpoint can't be reached.
>>>>>
>>>>>>> Another slight preference I have towards this idea is that the only
>>>>>>> known way we can break PMTU discovery right now is by using a bridge,
>>>>>>> so fixing the problem there looks more future-proof than addressing any
>>>>>>> kind of tunnel with this problem. I think FoU and GUE would hit the
>>>>>>> same problem, I don't know about IP tunnels, sticking that selftest
>>>>>>> snippet to whatever other test in pmtu.sh should tell.
>>>>>>
>>>>>> Every type of bridge port that needs to add additional header on egress
>>>>>> has this problem in the bridge scenario once the peer of the IP tunnel
>>>>>> signals a PMTU event.
>>>>>
>>>>> Yes :(
>>>>
>>>> The vxlan/tunnel device knows it is a bridge port, and it knows it is
>>>> going to push a udp and ip{v6} header. So why not use that information
>>>> in setting / updating the MTU? That's what I was getting at on Monday
>>>> with my comment about lwtunnel_headroom equivalent.
>>>
>>> If I understand correctly, you're proposing something similar to my
>>> earlier draft from:
>>>
>>> <20200713003813.01f2d5d3@...sabeth>
>>> https://lore.kernel.org/netdev/20200713003813.01f2d5d3@elisabeth/
>>>
>>> the problem with it is that it wouldn't help: the MTU is already set to
>>> the right value for both port and bridge in the case Florian originally
>>> reported.
>>
>> I am definitely hand waving; I have not had time to create a setup
>> showing the problem. Is there a reproducer using only namespaces?
>
> And I'm laser pointing: check the bottom of that email ;)
>
With this test case, the lookup fails:
[ 144.689378] vxlan: vxlan_xmit_one: dev vxlan_a 10.0.1.1/57864 ->
10.0.0.0/4789 len 5010 gw 10.0.1.2
[ 144.692755] vxlan: skb_tunnel_check_pmtu: dst dev br0 skb dev vxlan_a
skb len 5010 encap_mtu 4000 headroom 50
[ 144.697682] vxlan: skb_dst_update_pmtu_no_confirm: calling
ip_rt_update_pmtu+0x0/0x160/ffffffff825ee850 for dev br0 mtu 3950
[ 144.703601] IPv4: __ip_rt_update_pmtu: dev br0 mtu 3950 old_mtu 5000
192.168.2.1 -> 192.168.2.2
[ 144.708177] IPv4: __ip_rt_update_pmtu: fib_lookup failed for
192.168.2.1 -> 192.168.2.2
Because the lookup fails, __ip_rt_update_pmtu skips creating the exception.
This hack gets the lookup to succeed:
fl4->flowi4_oif = dst->dev->ifindex;
or
fl4->flowi4_oif = 0;
and the test passes.
Powered by blists - more mailing lists