lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1492807076414.83805@Dell.com>
Date:   Fri, 21 Apr 2017 20:37:56 +0000
From:   <Joe.Ghalam@...l.com>
To:     <maheshb@...gle.com>
Cc:     <herbert@...dor.apana.org.au>, <davem@...emloft.net>,
        <Clifford.Wichmann@...l.com>, <netdev@...r.kernel.org>
Subject: Re: macvlan: Fix device ref leak when purging bc_queue

________________________________________
> From: Mahesh Bandewar (महेश बंडेवार) <maheshb@...gle.com>
> Sent: Friday, April 21, 2017 12:23 PM
> To: Ghalam, Joe
> Cc: herbert@...dor.apana.org.au; David Miller; Wichmann, Clifford; linux-netdev
> Subject: Re: macvlan: Fix device ref leak when purging bc_queue

> May be the system is busy and snapshot is too small, and eventually
> process_broadcast() should get called. Deleting a slave does nothing
> about cancelling the work-queue so it would happen eventually.

> The change that Herbert proposed is correct. When packets are enqueued
> for processing later a dev reference is taken and it's removed when
> it's processed when it gets scheduled. The backlog is per port so it
> makes sense to remove reference(s) before purging the queue prior to
> deleting the port.

I only included the snapshot of the logs that's relevant. The system in question has been left in that state for hours, without ever seeing process_broadcast() being called. And, yes I did check the cpu load, and the system was running at around 20% load. So, I don't think that's the case. I would suggest to take closer look at the code in mtacvlan_dellink(), where it performs unlink and unregister:

void macvlan_dellink(struct net_device *dev, struct list_head *head)
{
	struct macvlan_dev *vlan = netdev_priv(dev);
	list_del_rcu(&vlan->list);
	unregister_netdevice_queue(dev, head);
	netdev_upper_dev_unlink(vlan->lowerdev, dev);
}

As I stated in my reply to Herbert initially, the code change he suggested is correct and needed, but not enough. We have tested with his code change and observed the same behavior. I can guarantee you that the code change to macvlan_port_destroy() has no effect on this issue, since the function macvlan_port_destroy () is not even called during the operation. 

Here is the forced stack trace that I caused to show the removal call:
Apr 20 06:23:40 OS10 kernel:  [<ffffffff810d312c>] __netdev_adjacent_dev_remove+0x3c/0x1a0
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81bb6e87>] __netdev_adjacent_dev_unlink_lists+0x67/0x69
Apr 20 06:23:40 OS10 kernel:  [<ffffffff810d32a0>] __netdev_adjacent_dev_unlink+0x82/0x40
Apr 20 06:23:40 OS10 kernel:  [<ffffffff811d31e0>] netdev_upper_dev_unlink+0x10/0x20
Apr 20 06:23:40 OS10 kernel:  [<ffffffff8180e770>] macvlan_dellink+0x50/0x130
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81a2ca27>] rtnl_dellink+0xb7/0x120
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81a609ab>] ? __netlink_ns_capable+0x3b/0x40
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81a2a6c5>] rtnetlink_rcv_msg+0x95/0x250
Apr 20 06:23:40 OS10 kernel:  [<ffffffff811c1499>] ? zone_statistics+0x89/0xa0
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81a0a9de>] ? __alloc_skb+0x7e/0x2a0
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81a2a630>] ? rtnetlink_rcv+0x30/0x30
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81a64f59>] netlink_rcv_skb+0xa9/0xc0
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81a2a628>] rtnetlink_rcv+0x28/0x30
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81a64603>] netlink_unicast+0xf3/0x200
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81a64a1e>] netlink_sendmsg+0x30e/0x680
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81a014fb>] sock_sendmsg+0x8b/0xc0
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81a011ee>] ? move_addr_to_kernel.part.18+0x1e/0x60
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81a01ff1>] ? move_addr_to_kernel+0x21/0x30
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81a018f6>] ___sys_sendmsg+0x376/0x390
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81a0019f>] ? sock_destroy_inode+0x2f/0x40
Apr 20 06:23:40 OS10 kernel:  [<ffffffff810a161c>] ? __do_page_fault+0x20c/0x560
Apr 20 06:23:40 OS10 kernel:  [<ffffffff812279ad>] ? dput+0xad/0x180
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81230a74>] ? mntput+0x24/0x40
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81212a50>] ? __fput+0x190/0x220
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81a026b2>] __sys_sendmsg+0x42/0x80
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81a02702>] SyS_sendmsg+0x12/0x20
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81bc86cd>] system_call_fast_compare_end+0x10/0x15

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ