[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <5472A083.9020801@oracle.com>
Date: Mon, 24 Nov 2014 11:05:39 +0800
From: Wengang <wen.gang.wang@...cle.com>
To: Jay Vosburgh <jay.vosburgh@...onical.com>,
Cong Wang <cwang@...pensource.com>
CC: Eric Dumazet <eric.dumazet@...il.com>,
netdev <netdev@...r.kernel.org>
Subject: Re: [PATCH] bonding: clear header_ops when last slave detached (v2)
于 2014年11月22日 02:54, Jay Vosburgh 写道:
> Cong Wang <cwang@...pensource.com> wrote:
>
>> On Thu, Nov 20, 2014 at 2:53 PM, Jay Vosburgh
>> <jay.vosburgh@...onical.com> wrote:
>>> Cong Wang <cwang@...pensource.com> wrote:
>>>
>>>> Also, no one seems to care about my previous question:
>>>> why only bonding has the problem?
>>> Bonding has the problem because it stashes a pointer to a data
>>> structure (the header_ops) from another module, and when that module is
>>> unloaded the dangling pointer may be dereferenced if it's not either
>>> cleared or made to never go away.
>> I knew, please re-read my question, I was asking why ONLY bonding
>> has the problem, i.e. why not neigh or whatever else calling
>> header_ops->foo()? :)
>>
>> As I said, I may miss some try_get_module() somewhere of course.
>> Needs more digging.
> My explanation is why only bonding has the problem; it's keeping
> a pointer (in bond_dev->header_ops) that is copied from the slave
> device's ->header_ops, and clearing that stashed pointer is (a) not
> correctly synchronized with the removal of the slave device, and (b)
> trying to simply clear the pointer has a check then use race in
> dev_hard_header.
>
> 8021q, for example, uses a "passthru" header_ops to call the
> underlying device's header_ops, but 8021q is only for ethernet, and the
> eth_header_ops are static in vmlinux, so it won't see these problems.
>
> Actually, now that I think about it, when the last ipoib slave
> is released, the bonding master device is theoretically supposed to be
> removed to avoid the sort of problem we're discussing here.
>
> That apparently isn't happening, unless Wengang is running
> pktgen and simultaneously removing the ipoib module (racing the transmit
> against the removal), or maybe something else is going on (perhaps
> pktgen holds a reference to the bonding master, preventing its removal).
>
> Also, curiously, looking at pkgten, pktgen_setup_dev appears to
> only accept devices of type ARPHRD_ETHER, but bonding with an ipoib
> slave would be ARPHRD_INFINIBAND. I'm therefore not sure how Wengang
> configured pktgen over an ipoib bond.
>
> Wengang, what kernel are you using, and is your kernel modified
> to change pktgen_setup_dev?
>
> -J
It's a 2.6.39 kernel.
code is like this:
static int pktgen_setup_dev(struct pktgen_dev *pkt_dev, const char *ifname)
{
struct net_device *odev;
int err;
/* Clean old setups */
if (pkt_dev->odev) {
dev_put(pkt_dev->odev);
pkt_dev->odev = NULL;
}
odev = pktgen_dev_get_by_name(pkt_dev, ifname);
if (!odev) {
pr_err("no such netdevice: \"%s\"\n", ifname);
return -ENODEV;
}
if (odev->type != ARPHRD_ETHER) {
pr_err("not an ethernet device: \"%s\"\n", ifname);
err = -EINVAL;
} else if (!netif_running(odev)) {
pr_err("device is down: \"%s\"\n", ifname);
err = -ENETDOWN;
} else {
pkt_dev->odev = odev;
return 0;
}
dev_put(odev);
return err;
}
No change done to it.
This problem is a side product when I was working with another area. I
am so far not very clear about the setup(no env to check now either).
thanks,
wengang
> ---
> -Jay Vosburgh, jay.vosburgh@...onical.com
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists