[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ygnhh79yluw2.fsf@nvidia.com>
Date: Thu, 20 Jan 2022 09:38:05 +0200
From: Vlad Buslov <vladbu@...dia.com>
To: Antoine Tenart <atenart@...nel.org>
CC: <davem@...emloft.net>, <kuba@...nel.org>, <echaudro@...hat.com>,
<sbrivio@...hat.com>, <netdev@...r.kernel.org>
Subject: Re: [PATCH net 1/2] vxlan: do not modify the shared tunnel info
when PMTU triggers an ICMP reply
Hello Antoine,
On Thu 25 Mar 2021 at 17:35, Antoine Tenart <atenart@...nel.org> wrote:
> When the interface is part of a bridge or an Open vSwitch port and a
> packet exceed a PMTU estimate, an ICMP reply is sent to the sender. When
> using the external mode (collect metadata) the source and destination
> addresses are reversed, so that Open vSwitch can match the packet
> against an existing (reverse) flow.
>
> But inverting the source and destination addresses in the shared
> ip_tunnel_info will make following packets of the flow to use a wrong
> destination address (packets will be tunnelled to itself), if the flow
> isn't updated. Which happens with Open vSwitch, until the flow times
> out.
>
> Fixes this by uncloning the skb's ip_tunnel_info before inverting its
> source and destination addresses, so that the modification will only be
> made for the PTMU packet, not the following ones.
>
> Fixes: fc68c99577cc ("vxlan: Support for PMTU discovery on directly bridged links")
> Tested-by: Eelco Chaudron <echaudro@...hat.com>
> Reviewed-by: Eelco Chaudron <echaudro@...hat.com>
> Signed-off-by: Antoine Tenart <atenart@...nel.org>
> ---
> drivers/net/vxlan.c | 18 ++++++++++++++----
> 1 file changed, 14 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index 666dd201c3d5..53dbc67e8a34 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -2725,12 +2725,17 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
> goto tx_error;
> } else if (err) {
> if (info) {
> + struct ip_tunnel_info *unclone;
> struct in_addr src, dst;
>
> + unclone = skb_tunnel_info_unclone(skb);
> + if (unlikely(!unclone))
> + goto tx_error;
> +
We have been getting memleaks in one of our tests that point to this
code (test deletes vxlan device while running traffic redirected by OvS
TC at the same time):
unreferenced object 0xffff8882d0114200 (size 256):
comm "softirq", pid 0, jiffies 4296140292 (age 1435.992s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 3b 85 84 ff ff ff ff .........;......
a1 26 b7 83 ff ff ff ff 00 00 00 00 00 00 00 00 .&..............
backtrace:
[<0000000097659d47>] metadata_dst_alloc+0x1f/0x470
[<000000007571c30f>] tun_dst_unclone+0xee/0x360 [vxlan]
[<00000000d2dcfd00>] vxlan_xmit_one+0x131d/0x2a00 [vxlan]
[<00000000281572b6>] vxlan_xmit+0x8e6/0x4cd0 [vxlan]
[<00000000d49d33fe>] dev_hard_start_xmit+0x1ba/0x710
[<00000000eac444f5>] __dev_queue_xmit+0x17c5/0x25f0
[<000000005fbd8585>] tcf_mirred_act+0xb1d/0xf70 [act_mirred]
[<0000000064b6eb2d>] tcf_action_exec+0x10e/0x350
[<00000000352821e8>] fl_classify+0x4e3/0x610 [cls_flower]
[<0000000011d3f765>] tcf_classify+0x33d/0x800
[<000000006c69b225>] __netif_receive_skb_core+0x18d6/0x2ae0
[<00000000dd256fe3>] __netif_receive_skb_one_core+0xaf/0x180
[<0000000065d43bd6>] process_backlog+0x2e3/0x710
[<00000000964357ae>] __napi_poll+0x9f/0x560
[<0000000059a93cf6>] net_rx_action+0x357/0xa60
[<00000000766481bc>] __do_softirq+0x282/0x94e
Looking at the code the potential issue seems to be that
tun_dst_unclone() creates new metadata_dst instance with refcount==1,
increments the refcount with dst_hold() to value 2, then returns it.
This seems to imply that caller is expected to release one of the
references (second one if for skb), but none of the callers (including
original dev_fill_metadata_dst()) do that, so I guess I'm
misunderstanding something here.
Any tips or suggestions?
> src = remote_ip.sin.sin_addr;
> dst = local_ip.sin.sin_addr;
> - info->key.u.ipv4.src = src.s_addr;
> - info->key.u.ipv4.dst = dst.s_addr;
> + unclone->key.u.ipv4.src = src.s_addr;
> + unclone->key.u.ipv4.dst = dst.s_addr;
> }
> vxlan_encap_bypass(skb, vxlan, vxlan, vni, false);
> dst_release(ndst);
> @@ -2781,12 +2786,17 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
> goto tx_error;
> } else if (err) {
> if (info) {
> + struct ip_tunnel_info *unclone;
> struct in6_addr src, dst;
>
> + unclone = skb_tunnel_info_unclone(skb);
> + if (unlikely(!unclone))
> + goto tx_error;
> +
> src = remote_ip.sin6.sin6_addr;
> dst = local_ip.sin6.sin6_addr;
> - info->key.u.ipv6.src = src;
> - info->key.u.ipv6.dst = dst;
> + unclone->key.u.ipv6.src = src;
> + unclone->key.u.ipv6.dst = dst;
> }
>
> vxlan_encap_bypass(skb, vxlan, vxlan, vni, false);
Powered by blists - more mailing lists