lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ygnhee52lg2d.fsf@nvidia.com>
Date:   Thu, 20 Jan 2022 14:58:18 +0200
From:   Vlad Buslov <vladbu@...dia.com>
To:     Antoine Tenart <atenart@...nel.org>
CC:     <davem@...emloft.net>, <kuba@...nel.org>, <echaudro@...hat.com>,
        <sbrivio@...hat.com>, <netdev@...r.kernel.org>, <pshelar@....org>
Subject: Re: [PATCH net 1/2] vxlan: do not modify the shared tunnel info
 when PMTU triggers an ICMP reply

On Thu 20 Jan 2022 at 12:27, Antoine Tenart <atenart@...nel.org> wrote:
> Hello Vlad,
>
> Quoting Vlad Buslov (2022-01-20 08:38:05)
>> On Thu 25 Mar 2021 at 17:35, Antoine Tenart <atenart@...nel.org> wrote:
>> >
>> > diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
>> > index 666dd201c3d5..53dbc67e8a34 100644
>> > --- a/drivers/net/vxlan.c
>> > +++ b/drivers/net/vxlan.c
>> > @@ -2725,12 +2725,17 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
>> >                       goto tx_error;
>> >               } else if (err) {
>> >                       if (info) {
>> > +                             struct ip_tunnel_info *unclone;
>> >                               struct in_addr src, dst;
>> >  
>> > +                             unclone = skb_tunnel_info_unclone(skb);
>> > +                             if (unlikely(!unclone))
>> > +                                     goto tx_error;
>> > +
>> 
>> We have been getting memleaks in one of our tests that point to this
>> code (test deletes vxlan device while running traffic redirected by OvS
>> TC at the same time):
>> 
>> unreferenced object 0xffff8882d0114200 (size 256):
>>   comm "softirq", pid 0, jiffies 4296140292 (age 1435.992s)
>>   hex dump (first 32 bytes):
>>     00 00 00 00 00 00 00 00 00 3b 85 84 ff ff ff ff  .........;......
>>     a1 26 b7 83 ff ff ff ff 00 00 00 00 00 00 00 00  .&..............
>>   backtrace:
>>     [<0000000097659d47>] metadata_dst_alloc+0x1f/0x470
>>     [<000000007571c30f>] tun_dst_unclone+0xee/0x360 [vxlan]
>>     [<00000000d2dcfd00>] vxlan_xmit_one+0x131d/0x2a00 [vxlan]
>>     [<00000000281572b6>] vxlan_xmit+0x8e6/0x4cd0 [vxlan]
>>     [<00000000d49d33fe>] dev_hard_start_xmit+0x1ba/0x710
>>     [<00000000eac444f5>] __dev_queue_xmit+0x17c5/0x25f0
>>     [<000000005fbd8585>] tcf_mirred_act+0xb1d/0xf70 [act_mirred]
>>     [<0000000064b6eb2d>] tcf_action_exec+0x10e/0x350
>>     [<00000000352821e8>] fl_classify+0x4e3/0x610 [cls_flower]
>>     [<0000000011d3f765>] tcf_classify+0x33d/0x800
>>     [<000000006c69b225>] __netif_receive_skb_core+0x18d6/0x2ae0
>>     [<00000000dd256fe3>] __netif_receive_skb_one_core+0xaf/0x180
>>     [<0000000065d43bd6>] process_backlog+0x2e3/0x710
>>     [<00000000964357ae>] __napi_poll+0x9f/0x560
>>     [<0000000059a93cf6>] net_rx_action+0x357/0xa60
>>     [<00000000766481bc>] __do_softirq+0x282/0x94e
>> 
>> Looking at the code the potential issue seems to be that
>> tun_dst_unclone() creates new metadata_dst instance with refcount==1,
>> increments the refcount with dst_hold() to value 2, then returns it.
>> This seems to imply that caller is expected to release one of the
>> references (second one if for skb), but none of the callers (including
>> original dev_fill_metadata_dst()) do that, so I guess I'm
>> misunderstanding something here.
>> 
>> Any tips or suggestions?
>
> I'd say there is no need to increase the dst refcount here after calling
> metadata_dst_alloc, as the metadata is local to the skb and the dst
> refcount was already initialized to 1. This might be an issue with
> commit fc4099f17240 ("openvswitch: Fix egress tunnel info."); I CCed
> Pravin, he might recall if there was a reason to increase the refcount.

I tried to remove the dst_hold(), but that caused underflows[0], so I
guess the current reference counting is required at least for some
use-cases.

[0]:

[  118.803011] ------------[ cut here ]------------                                    
[  118.803011] dst_release: dst:000000001fc13e61 refcnt:-2                             
[  118.803019] dst_release: dst:000000001fc13e61 refcnt:-2                             
[  118.803027] dst_release: dst:000000001fc13e61 refcnt:-2                             
[  118.803041] dst_release: dst:000000001fc13e61 refcnt:-2                             
[  118.803046] dst_release: dst:000000001fc13e61 refcnt:-2                             
[  118.803060] dst_release: dst:000000001fc13e61 refcnt:-2                             
[  118.803065] dst_release: dst:000000001fc13e61 refcnt:-2                             
[  118.803078] dst_release: dst:000000001fc13e61 refcnt:-2                             
[  118.803083] dst_release: dst:000000001fc13e61 refcnt:-2                             
[  118.803096] dst_release: dst:000000001fc13e61 refcnt:-2                             
[  118.803920] dst_release underflow                                                   
[  118.803937] WARNING: CPU: 4 PID: 0 at net/core/dst.c:173 dst_release+0x79/0x90                                                                                             
[  118.815961] Modules linked in: act_tunnel_key act_mirred act_skbedit veth vxlan ip6_udp_tunnel udp_tunnel act_gact cls_flower sch_ingress openvswitch nsh nf_conncount mlx5_ib mlx5_core mlxfw pci_hyperv_intf ptp pps_core nfsv3 nfs_acl xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_filter iptable_nat nf_nat nf_connt
rack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill overlay rpcrdma rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser ib_umad rdma_cm ib_ipoib iw_cm ib_cm ib_uverbs sunrpc ib_core iTCO_wdt iTCO_vendor_support lpc_ich mfd_core virtio_net 
net_failover failover i2c_i801 i2c_smbus kvm_intel kvm pcspkr irqbypass crc32_pclmul ghash_clmulni_intel sch_fq_codel drm i2c_core ip_tables crc32c_intel serio_raw fuse [last unloaded: mlxfw]
[  118.829464] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 5.16.0+ #3                                                                                                           
[  118.830567] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014                                                                                                                                                                                                                              
[  118.832541] RIP: 0010:dst_release+0x79/0x90                                         
[  118.833372] Code: 04 e8 db 14 01 00 8b 4c 24 04 85 c0 74 c7 e9 4d 60 22 00 48 c7 c7 35 00 32 82 89 4c 24 04 c6 05 c5 47 e5 00 01 e8 6f c2 1b 00 <0f> 0b 8b 4c 24 04 eb cb 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40                                                                                                                                       
[  118.836566] RSP: 0018:ffffc90000180ee0 EFLAGS: 00010282                             
[  118.837546] RAX: 0000000000000000 RBX: ffff88813a139ab0 RCX: 0000000000000000                                                                                              
[  118.838912] RDX: 0000000000000102 RSI: ffffffff82286273 RDI: 00000000ffffffff                                                                                              
[  118.840278] RBP: 0000000000000004 R08: 0000000000000015 R09: ffffc90000180e78                                                                                              
[  118.841646] R10: ffffffff825c7000 R11: 0000000000000001 R12: ffff88811d3ab480                                                                                              
[  118.843008] R13: 0000000000000170 R14: 0000000000000000 R15: ffff8882f5a2e1b8                                                                                              
[  118.844371] FS:  0000000000000000(0000) GS:ffff8882f5a00000(0000) knlGS:0000000000000000                                                                                   
[  118.845968] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033                                                                                                              
[  118.847100] CR2: 00007fa5e5abd7e0 CR3: 0000000175408005 CR4: 0000000000770ee0                                                                                              
[  118.848461] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000                                                                                              
[  118.849834] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400                                                                                              
[  118.851198] PKRU: 55555554                                                          
[  118.851815] Call Trace:                                                             
[  118.852387]  <IRQ>                                                                  
[  118.852894]  dst_cache_destroy+0x33/0x60                                            
[  118.853728]  dst_destroy+0xaa/0xb0                                                  
[  118.854465]  rcu_core+0x2a3/0x960                                                   
[  118.855197]  __do_softirq+0xf0/0x2f9                                                
[  118.855976]  __irq_exit_rcu+0xcc/0x120                                              
[  118.856765]  sysvec_apic_timer_interrupt+0xa2/0xd0                                  
[  118.857746]  </IRQ>                                                                 
[  118.858270]  <TASK>                                                                 
[  118.858775]  asm_sysvec_apic_timer_interrupt+0x12/0x20                              
[  118.859801] RIP: 0010:native_safe_halt+0xb/0x10                                     
[  118.860731] Code: 7e ff ff ff 7f 5b c3 65 48 8b 04 25 c0 cb 01 00 f0 80 48 02 20 48 8b 00 a8 08 75 c3 eb 80 cc eb 07 0f 00 2d 8f f5 4f 00 fb f4 <c3> 0f 1f 40 00 eb 07 0f 00 2d 7f f5 4f 00 f4 c3 cc cc cc cc cc 0f                                                                                                                                       
[  118.864175] RSP: 0018:ffffc9000009fef0 EFLAGS: 00000206                             
[  118.865223] RAX: 0000000000027f6c RBX: 0000000000000004 RCX: 0000000000000000                                                                                              
[  118.866504] RDX: ffff88817c090d50 RSI: ffffffff82286273 RDI: ffffffff8225bf7f                                                                                              
[  118.867774] RBP: ffff8881009d2e80 R08: 0000000000027f6b R09: 0000000000000000                                                                                              
[  118.869051] R10: 00000000fffd3b17 R11: 0000000000000001 R12: 0000000000000000                                                                                              
[  118.870326] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000                                                                                              
[  118.871601]  default_idle+0xa/0x10                                                  
[  118.872300]  default_idle_call+0x33/0xe0                                            
[  118.873081]  do_idle+0x208/0x270                                                    
[  118.873741]  cpu_startup_entry+0x19/0x20                                            
[  118.874562]  secondary_startup_64_no_verify+0xc3/0xcb                               
[  118.875604]  </TASK>                                                                
[  118.876140] ---[ end trace dec5061c76371ce7 ]---   

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ