lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4b1a0cb90912171331t3297f01aw3bf014e27cd5fb0e@mail.gmail.com>
Date:	Thu, 17 Dec 2009 13:31:36 -0800
From:	Mikhail Markine <markine@...gle.com>
To:	Jay Vosburgh <fubar@...ibm.com>
Cc:	Johannes Berg <johannes@...solutions.net>,
	Jarek Poplawski <jarkao2@...il.com>, netdev@...r.kernel.org,
	bonding-devel@...ts.sourceforge.net,
	Petri Gynther <pgynther@...gle.com>,
	David Miller <davem@...emloft.net>
Subject: Re: [Bonding-devel] [PATCH] bonding: cancel_delayed_work() -> 
	cancel_delayed_work_sync()

Jay,

We originally applied this patch against bond_main.c under 2.6.25
(bear with me) in response to a BUG() observed in testing:

------------[ cut here ]------------
Kernel BUG at c0039aa4 [verbose debug info unavailable]
Oops: Exception in kernel mode, sig: 5 [#1]
[SNIP]
NIP [c0039aa4] queue_delayed_work_on+0x60/0x120
LR [e1578a7c] bond_mii_monitor+0x6c/0x98 [bonding]
Call Trace:
[db401f30] [c029e0e8] 0xc029e0e8 (unreliable)
[db401f50] [e1578a7c] bond_mii_monitor+0x6c/0x98 [bonding]
[db401f60] [c003914c] run_workqueue+0xb8/0x148
[db401f90] [c0039700] worker_thread+0x70/0xd0
[db401fd0] [c003d13c] kthread+0x48/0x84
[db401ff0] [c000dab4] kernel_thread+0x44/0x60
Instruction dump:
40a2fff4 71200001 38600000 41820018 80010024 bb810010 38210020 7c0803a6
4e800020 80050010 3160ffff 7d2b0110 <0f090000> 80050004 39250004 7c004a78
---[ end trace ef4eb74d43ff218e ]---

This was a BUG_ON(timer_pending(timer)) call in
queue_delayed_work_on() which was called from queue_delayed_work()
which was called at the bottom of bond_mii_monitor(). This implied
that mii_work was queued twice.

In both 2.6.25 and the current HEAD of Linus' tree, mii_work is queued
up in two places: bond_mii_monitor() and bond_open(). bond_open()
calls INIT_DELAYED_WORK() prior to queue_delayed_work().

Under both kernels, if bond_mii_monitor() sleeps due to RTNL while
bond_close() is called, setting kill_timers=1 and calling
cancel_delayed_work(&bond->mii_work) in bond_close() accomplishes
nothing. bond_mii_monitor() will eventually wake up and call
queue_delayed_work() again. A possible explanation for the 2.6.25
kernel trace above is that bond_close() and bond_open() are called on
the same net_device in succession while bond_mii_monitor() sleeps with
the associated bond. I believe the same is still possible on HEAD.
Thoughts?

Mikhail

On Thu, Dec 17, 2009 at 8:12 AM, Jay Vosburgh <fubar@...ibm.com> wrote:
> Johannes Berg <johannes@...solutions.net> wrote:
>
>>On Thu, 2009-12-17 at 13:36 +0000, Jarek Poplawski wrote:
>>> On Thu, Dec 17, 2009 at 02:19:46PM +0100, Johannes Berg wrote:
>>> > Jarek,
>>> >
>>> > Sorry to mail you directly, but I only saw your reply on gmane and
>>> > didn't want to break up the threading etc.
>>> >
>>> > cancel_delayed_work_sync() should be ok in this case unless the work
>>> > items themselves used the rtnl,
>>>
>>> Hmm, I'm not sure I get your point, but e.g. bond_mii_monitor() work
>>> function can get rtnl_lock().
>>
>>Ok in that case you can't cancel_sync() it under rtnl. I was thinking of
>>the case where it's just not ok because of other work. Sorry for the
>>confusion!
>
>        There's already logic in the monitors (bond_mii_monitor, et al)
> to check a sentinel (kill_timers) and do nothing (not acquire rtnl) and
> return.
>
>        What exactly is the nature of the race that doing cancel..sync
> is fixing?  The bond_close function sets kill_timers prior to calling
> the cancel functions, so the monitor function might run once, but it
> should do nothing.
>
>        -J
>
> ---
>        -Jay Vosburgh, IBM Linux Technology Center, fubar@...ibm.com
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ