netdev - Re: [PATCH] bonding: Do not try to send packets over dead link in TLB mode.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAF2d9jhJuisK1TmePtz-X79-oDmQHGHhvipAVAE=5Y2cu7tSEg@mail.gmail.com>
Date:	Wed, 9 Jul 2014 10:24:29 -0700
From:	Mahesh Bandewar <maheshb@...gle.com>
To:	Veaceslav Falico <vfalico@...hat.com>
Cc:	Eric Dumazet <eric.dumazet@...il.com>,
	Nikolay Aleksandrov <nikolay@...hat.com>,
	Jay Vosburgh <fubar@...ibm.com>,
	Andy Gospodarek <andy@...yhouse.net>,
	David Miller <davem@...emloft.net>,
	netdev <netdev@...r.kernel.org>,
	Eric Dumazet <edumazet@...gle.com>,
	Maciej Zenczykowski <maze@...gle.com>
Subject: Re: [PATCH] bonding: Do not try to send packets over dead link in TLB mode.

On Wed, Jul 9, 2014 at 10:15 AM, Mahesh Bandewar <maheshb@...gle.com> wrote:
> On Wed, Jul 9, 2014 at 6:27 AM, Veaceslav Falico <vfalico@...hat.com> wrote:
>> On Wed, Jul 09, 2014 at 03:17:29PM +0200, Eric Dumazet wrote:
>>>
>>> On Wed, 2014-07-09 at 14:04 +0200, Veaceslav Falico wrote:
>>>>
>>>> On Wed, Jul 09, 2014 at 12:25:43PM +0200, Nikolay Aleksandrov wrote:
>>>> >On 07/09/2014 12:24 PM, Veaceslav Falico wrote:
>>>> >> On Tue, Jul 08, 2014 at 06:09:58PM -0700, Mahesh Bandewar wrote:
>>>> ...snip...
>>>> >>> +    spin_lock(&bond_info->slave_arr_lock);
>>>> >>
>>>> >> I don't think you can re-enter bond_alb_handle_link_change(), as it's
>>>> >> protected either by rtnl or write-lock curr_active_slave.
>>>> >>
>>>> >Actually a very good catch :-)
>>>> >Maybe the allocation above should be done with GFP_ATOMIC.
>>>>
>>>> For the record - it's indeed always under rtnl, so ASSERT_RTNL() (from
>>>> your
>>>> other email) is a good idea.
>>>
>>>
>>> Strange. I basically suggested the ASSERT_RTNL() to Mahesh few days ago
>>> and he tried this. But the assert triggered with miimon, so Mahesh added
>>> back the spinlock.
>>
>>
>> That's indeed strange... From the code:
>>
>> 2103                 if (!rtnl_trylock()) {
>> 2104                         delay = 1;
>> 2105                         should_notify_peers = false;
>> 2106                         goto re_arm;
>> 2107                 }
>> 2108
>> 2109                 bond_miimon_commit(bond);
>> 2110
>> 2111                 rtnl_unlock();  /* might sleep, hold no other locks */
>>
>> And we can get there only through bond_miimon_commit(), as part of the
>> miimon.
>>
>> Maybe you've hit the kmalloc(GFP_KERNEL) warning?
>
> Hmm, actually as Eric mentioned, I did try removing the spinlock to
> just use RTNL but assert failed and it wasn't sleepable fn called in
> atomic context message (I assume that is what you are suggesting from
> kmalloc warning).
>
> I managed to find the stack trace for that -
>
> RTNL: assertion failed at drivers/net/bonding/bond_alb.c (1371)
> CPU: 2 PID: 3571 Comm: kworker/u16:7 Not tainted 3.11.10-smp-DEV #91
> Workqueue: bond0 bond_mii_monitor [bonding]
>  ffff8801175e4800 ffff880113b53d38 ffffffff97f97cab 000000000000004d
>  ffff8801175c6800 ffff880113b53d58 ffffffffc046a17c ffff8801175c6800
>  ffff8801175e4800 ffff880113b53d88 ffffffffc045ef27 0000000000000000
> Call Trace:
>  [<ffffffff97f97cab>] dump_stack+0x46/0x58
>  [<ffffffffc046a17c>] bond_alb_handle_link_change+0x16c/0x180 [bonding]
>  [<ffffffffc045ef27>] bond_handle_link_change+0x57/0x80 [bonding]
>  [<ffffffffc0462d79>] bond_mii_monitor+0x679/0x6e0 [bonding]
>  [<ffffffff97a6a7c0>] process_one_work+0x140/0x3f0
>  [<ffffffff97a6aef1>] worker_thread+0x121/0x370
>  [<ffffffff97a6add0>] ? rescuer_thread+0x320/0x320
>  [<ffffffff97a720a0>] kthread+0xc0/0xd0
>  [<ffffffff97a71fe0>] ? flush_kthread_worker+0x80/0x80
>  [<ffffffff97fa291c>] ret_from_fork+0x7c/0xb0
>  [<ffffffff97a71fe0>] ? flush_kthread_worker+0x80/0x80

Please note that I got this assert-fail on 3.10 kernel and haven't
verified if things have changed in mii_monitor context since then.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html