lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 21 Nov 2014 10:54:11 -0800
From:	Jay Vosburgh <jay.vosburgh@...onical.com>
To:	Cong Wang <cwang@...pensource.com>
cc:	Eric Dumazet <eric.dumazet@...il.com>,
	Wengang <wen.gang.wang@...cle.com>,
	netdev <netdev@...r.kernel.org>
Subject: Re: [PATCH] bonding: clear header_ops when last slave detached (v2)

Cong Wang <cwang@...pensource.com> wrote:

>On Thu, Nov 20, 2014 at 2:53 PM, Jay Vosburgh
><jay.vosburgh@...onical.com> wrote:
>> Cong Wang <cwang@...pensource.com> wrote:
>>
>>>Also, no one seems to care about my previous question:
>>>why only bonding has the problem?
>>
>>         Bonding has the problem because it stashes a pointer to a data
>> structure (the header_ops) from another module, and when that module is
>> unloaded the dangling pointer may be dereferenced if it's not either
>> cleared or made to never go away.
>
>I knew, please re-read my question, I was asking why ONLY bonding
>has the problem, i.e. why not neigh or whatever else calling
>header_ops->foo()? :)
>
>As I said, I may miss some try_get_module() somewhere of course.
>Needs more digging.

	My explanation is why only bonding has the problem; it's keeping
a pointer (in bond_dev->header_ops) that is copied from the slave
device's ->header_ops, and clearing that stashed pointer is (a) not
correctly synchronized with the removal of the slave device, and (b)
trying to simply clear the pointer has a check then use race in
dev_hard_header.

	8021q, for example, uses a "passthru" header_ops to call the
underlying device's header_ops, but 8021q is only for ethernet, and the
eth_header_ops are static in vmlinux, so it won't see these problems.

	Actually, now that I think about it, when the last ipoib slave
is released, the bonding master device is theoretically supposed to be
removed to avoid the sort of problem we're discussing here.

	That apparently isn't happening, unless Wengang is running
pktgen and simultaneously removing the ipoib module (racing the transmit
against the removal), or maybe something else is going on (perhaps
pktgen holds a reference to the bonding master, preventing its removal).

	Also, curiously, looking at pkgten, pktgen_setup_dev appears to
only accept devices of type ARPHRD_ETHER, but bonding with an ipoib
slave would be ARPHRD_INFINIBAND.  I'm therefore not sure how Wengang
configured pktgen over an ipoib bond.

	Wengang, what kernel are you using, and is your kernel modified
to change pktgen_setup_dev?

	-J

---
	-Jay Vosburgh, jay.vosburgh@...onical.com
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists