[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <54A21596.7000706@oracle.com>
Date: Tue, 30 Dec 2014 11:01:42 +0800
From: Wengang <wen.gang.wang@...cle.com>
To: Cong Wang <cwang@...pensource.com>
CC: netdev <netdev@...r.kernel.org>, linux-rdma@...r.kernel.org
Subject: Re: [PATCH] bonding: move ipoib_header_ops to vmlinux
于 2014年12月30日 05:32, Cong Wang 写道:
> On Mon, Nov 24, 2014 at 9:36 PM, Wengang Wang <wen.gang.wang@...cle.com> wrote:
>> When last slave of a bonding master is removed, the bonding then does not work.
>> At the time if packet_snd is called against with a master net_device, it calls
>> then header_ops->create which points to slave's header_ops. In case the slave
>> is ipoib and the module is unloaded, header_ops would point to invalid address.
>> Accessing it will cause problem.
>> This patch tries to fix this issue by moving ipoib_header_ops to vmlinux to keep
>> it valid even when ipoib module is unloaded.
>>
> I still don't think this is the right way to fix the bug, because
> there are other modules which have the same bug, for example,
> net/bluetooth/6lowpan.c (unless it can't be enslaved into a bond
> of course, I haven't verified). We don't want to move them all
> to builtin kernel, do we?
I don't think bluetooth needs bonding(and you didn't verify at all).
Do you have strong reason(examples)?
> On the other hand, as Jay explained, bonding is probably
> the only one which has the problem, why not solve it in bonding?
> I mean why do we have to adjust other modules just for bonding?
There are more than one way we do things. For this case, considering
needs, complexity and stability I think moving ipoib_header_ops is the
right way to go.
> When its slave device gets removed, some netdev event is sent
> to the master, so why not reset the header_ops to ether header ops
> up on this event? Something like below (not even compile tested):
>
>
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 184c434..aedccae 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -1726,6 +1726,7 @@ static int __bond_release_one(struct net_device *bond_dev,
> if (!bond_has_slaves(bond)) {
> bond_set_carrier(bond);
> eth_hw_addr_random(bond_dev);
> + ether_setup(bond_dev);
> }
>
> unblock_netpoll_tx();
>
> It should solve more than just your ipoib case.
Do you think it's safe in a race condition? Before you change
header_ops, you have no
way to prevent another thread from accessing the old one.
> Last but not least, since you said you saw the crash in packet_snd(),
> it would also be interesting to know why packet_snd() still picks this
> bond dev after all its slaves are gone, after all it is not a functional
> device without any slave (its carrier is off).
It's quite normal in a race situation.
thanks,
wengang
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists