lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ygnhh79yluw2.fsf@nvidia.com>
Date:   Thu, 20 Jan 2022 09:38:05 +0200
From:   Vlad Buslov <vladbu@...dia.com>
To:     Antoine Tenart <atenart@...nel.org>
CC:     <davem@...emloft.net>, <kuba@...nel.org>, <echaudro@...hat.com>,
        <sbrivio@...hat.com>, <netdev@...r.kernel.org>
Subject: Re: [PATCH net 1/2] vxlan: do not modify the shared tunnel info
 when PMTU triggers an ICMP reply

Hello Antoine,

On Thu 25 Mar 2021 at 17:35, Antoine Tenart <atenart@...nel.org> wrote:
> When the interface is part of a bridge or an Open vSwitch port and a
> packet exceed a PMTU estimate, an ICMP reply is sent to the sender. When
> using the external mode (collect metadata) the source and destination
> addresses are reversed, so that Open vSwitch can match the packet
> against an existing (reverse) flow.
>
> But inverting the source and destination addresses in the shared
> ip_tunnel_info will make following packets of the flow to use a wrong
> destination address (packets will be tunnelled to itself), if the flow
> isn't updated. Which happens with Open vSwitch, until the flow times
> out.
>
> Fixes this by uncloning the skb's ip_tunnel_info before inverting its
> source and destination addresses, so that the modification will only be
> made for the PTMU packet, not the following ones.
>
> Fixes: fc68c99577cc ("vxlan: Support for PMTU discovery on directly bridged links")
> Tested-by: Eelco Chaudron <echaudro@...hat.com>
> Reviewed-by: Eelco Chaudron <echaudro@...hat.com>
> Signed-off-by: Antoine Tenart <atenart@...nel.org>
> ---
>  drivers/net/vxlan.c | 18 ++++++++++++++----
>  1 file changed, 14 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index 666dd201c3d5..53dbc67e8a34 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -2725,12 +2725,17 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
>  			goto tx_error;
>  		} else if (err) {
>  			if (info) {
> +				struct ip_tunnel_info *unclone;
>  				struct in_addr src, dst;
>  
> +				unclone = skb_tunnel_info_unclone(skb);
> +				if (unlikely(!unclone))
> +					goto tx_error;
> +

We have been getting memleaks in one of our tests that point to this
code (test deletes vxlan device while running traffic redirected by OvS
TC at the same time):

unreferenced object 0xffff8882d0114200 (size 256):
  comm "softirq", pid 0, jiffies 4296140292 (age 1435.992s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 3b 85 84 ff ff ff ff  .........;......
    a1 26 b7 83 ff ff ff ff 00 00 00 00 00 00 00 00  .&..............
  backtrace:
    [<0000000097659d47>] metadata_dst_alloc+0x1f/0x470
    [<000000007571c30f>] tun_dst_unclone+0xee/0x360 [vxlan]
    [<00000000d2dcfd00>] vxlan_xmit_one+0x131d/0x2a00 [vxlan]
    [<00000000281572b6>] vxlan_xmit+0x8e6/0x4cd0 [vxlan]
    [<00000000d49d33fe>] dev_hard_start_xmit+0x1ba/0x710
    [<00000000eac444f5>] __dev_queue_xmit+0x17c5/0x25f0
    [<000000005fbd8585>] tcf_mirred_act+0xb1d/0xf70 [act_mirred]
    [<0000000064b6eb2d>] tcf_action_exec+0x10e/0x350
    [<00000000352821e8>] fl_classify+0x4e3/0x610 [cls_flower]
    [<0000000011d3f765>] tcf_classify+0x33d/0x800
    [<000000006c69b225>] __netif_receive_skb_core+0x18d6/0x2ae0
    [<00000000dd256fe3>] __netif_receive_skb_one_core+0xaf/0x180
    [<0000000065d43bd6>] process_backlog+0x2e3/0x710
    [<00000000964357ae>] __napi_poll+0x9f/0x560
    [<0000000059a93cf6>] net_rx_action+0x357/0xa60
    [<00000000766481bc>] __do_softirq+0x282/0x94e

Looking at the code the potential issue seems to be that
tun_dst_unclone() creates new metadata_dst instance with refcount==1,
increments the refcount with dst_hold() to value 2, then returns it.
This seems to imply that caller is expected to release one of the
references (second one if for skb), but none of the callers (including
original dev_fill_metadata_dst()) do that, so I guess I'm
misunderstanding something here.

Any tips or suggestions?

>  				src = remote_ip.sin.sin_addr;
>  				dst = local_ip.sin.sin_addr;
> -				info->key.u.ipv4.src = src.s_addr;
> -				info->key.u.ipv4.dst = dst.s_addr;
> +				unclone->key.u.ipv4.src = src.s_addr;
> +				unclone->key.u.ipv4.dst = dst.s_addr;
>  			}
>  			vxlan_encap_bypass(skb, vxlan, vxlan, vni, false);
>  			dst_release(ndst);
> @@ -2781,12 +2786,17 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
>  			goto tx_error;
>  		} else if (err) {
>  			if (info) {
> +				struct ip_tunnel_info *unclone;
>  				struct in6_addr src, dst;
>  
> +				unclone = skb_tunnel_info_unclone(skb);
> +				if (unlikely(!unclone))
> +					goto tx_error;
> +
>  				src = remote_ip.sin6.sin6_addr;
>  				dst = local_ip.sin6.sin6_addr;
> -				info->key.u.ipv6.src = src;
> -				info->key.u.ipv6.dst = dst;
> +				unclone->key.u.ipv6.src = src;
> +				unclone->key.u.ipv6.dst = dst;
>  			}
>  
>  			vxlan_encap_bypass(skb, vxlan, vxlan, vni, false);

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ