lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140716152629.GA9340@linux.vnet.ibm.com>
Date:	Wed, 16 Jul 2014 08:26:29 -0700
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Ingo Molnar <mingo@...nel.org>
Cc:	linux-kernel@...r.kernel.org, fabf@...net.be,
	bobby.prani@...il.com, davidshan@...cent.com, joe@...ches.com,
	keescook@...omium.org, Peter Zijlstra <a.p.zijlstra@...llo.nl>
Subject: Re: [GIT PULL rcu/next] RCU commits for 3.17

On Wed, Jul 16, 2014 at 07:17:07AM -0700, Paul E. McKenney wrote:
> On Wed, Jul 16, 2014 at 03:13:22PM +0200, Ingo Molnar wrote:
> > 
> > * Ingo Molnar <mingo@...nel.org> wrote:
> > 
> > > 
> > > * Paul E. McKenney <paulmck@...ux.vnet.ibm.com> wrote:
> > > 
> > > > Hello, Ingo,
> > > > 
> > > > The changes in this series include:
> > > > 
> > > > 1.	Update RCU documentation.  These were posted to LKML at
> > > > 	https://lkml.org/lkml/2014/7/7/650.
> > > > 
> > > > 2.	Miscellaneous fixes.  These were posted to LKML at
> > > > 	https://lkml.org/lkml/2014/7/7/678.
> > > > 
> > > > 3.	Maintainership changes.  These were posted to LKML at
> > > > 	https://lkml.org/lkml/2014/7/7/713, with a couple of
> > > > 	additional at https://lkml.org/lkml/2014/7/3/812 and
> > > > 	https://lkml.org/lkml/2014/6/2/585.
> > > > 
> > > > 4.	Torture-test updates.  These were posted to LKML at
> > > > 	https://lkml.org/lkml/2014/7/7/816.
> > > > 
> > > > 5.	Callback-offloading changes.  These were posted to LKML at
> > > > 	https://lkml.org/lkml/2014/7/7/1007.
> > > > 
> > > > All of these have been exposed to -next testing.
> > 
> > JFYI, the attached x86 (rand!-) config crashes on early bootup:
> > 
> > [    0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2
> > [    0.000000] ------------[ cut here ]------------
> > [    0.000000] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/cpu/common.c:1439 warn_pre_alternatives+0x1e/0x20()
> > [    0.000000] You're using static_cpu_has before alternatives have run!
> > [    0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.16.0-rc5+ #236363
> > [    0.000000] Hardware name: System manufacturer System Product Name/A8N-E, BIOS ASUS A8N-E ACPI BIOS Revision 1008 08/22/2005
> > [    0.000000]  0000000000000000 ffffffff82787c60 ffffffff81ff4ec0 ffffffff82787ca8
> > [    0.000000]  ffffffff82787c98 ffffffff810651ac ffffffff81011531 ffffffff82787e48
> > [    0.000000]  0000000000000000 0000000000000000 ffffffff827cf880 ffffffff82787cf8
> > [    0.000000] Call Trace:
> > [    0.000000]  [<ffffffff81ff4ec0>] dump_stack+0x4d/0x66
> > [    0.000000]  [<ffffffff810651ac>] warn_slowpath_common+0x7a/0x93
> > [    0.000000]  [<ffffffff81011531>] ? warn_pre_alternatives+0x1e/0x20
> > [    0.000000]  [<ffffffff81065239>] warn_slowpath_fmt+0x4c/0x4e
> > [    0.000000]  [<ffffffff82005817>] ? irq_return+0x7/0x7
> > [    0.000000]  [<ffffffff81011531>] warn_pre_alternatives+0x1e/0x20
> > [    0.000000]  [<ffffffff81033d31>] __do_page_fault+0xc3/0x43f
> > [    0.000000]  [<ffffffff81005347>] ? print_context_stack+0x6a/0xb6
> > [    0.000000]  [<ffffffff810045f1>] ? dump_trace+0x27d/0x294
> > [    0.000000]  [<ffffffff815b185b>] ? number.isra.1+0x127/0x22c
> > [    0.000000]  [<ffffffff8109abe2>] ? print_time.part.5+0x58/0x5c
> > [    0.000000]  [<ffffffff81086a9a>] ? sched_clock_cpu+0x11/0xb9
> > [    0.000000]  [<ffffffff810340ef>] do_page_fault+0x1e/0x54
> > [    0.000000]  [<ffffffff82005817>] ? irq_return+0x7/0x7
> > [    0.000000]  [<ffffffff82006872>] page_fault+0x22/0x30
> > [    0.000000]  [<ffffffff815b65fa>] ? __bitmap_or+0x15/0x28
> > [    0.000000]  [<ffffffff82c876f0>] rcu_init_one+0x4c0/0x55d
> > [    0.000000]  [<ffffffff82c87b00>] rcu_init+0x270/0x2da
> > [    0.000000]  [<ffffffff82c6ec82>] start_kernel+0x24f/0x4d2
> > [    0.000000]  [<ffffffff82c6e841>] ? set_init_arg+0x53/0x53
> > [    0.000000]  [<ffffffff82c6e453>] x86_64_start_reservations+0x2a/0x2c
> > [    0.000000]  [<ffffffff82c6e546>] x86_64_start_kernel+0xf1/0xf4
> > [    0.000000] ---[ end trace 4650963e41188009 ]---
> > [    0.000000] BUG: unable to handle kernel NULL pointer dereference at           (null)
> > [    0.000000] IP: [<ffffffff815b65fa>] __bitmap_or+0x15/0x28
> > [    0.000000] PGD 0 
> > [    0.000000] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
> > [    0.000000] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W     3.16.0-rc5+ #236363
> > [    0.000000] Hardware name: System manufacturer System Product Name/A8N-E, BIOS ASUS A8N-E ACPI BIOS Revision 1008 08/22/2005
> > [    0.000000] task: ffffffff827a3480 ti: ffffffff82784000 task.ti: ffffffff82784000
> > [    0.000000] RIP: 0010:[<ffffffff815b65fa>]  [<ffffffff815b65fa>] __bitmap_or+0x15/0x28
> > [    0.000000] RSP: 0000:ffffffff82787ef8  EFLAGS: 00010002
> > [    0.000000] RAX: 0000000000000000 RBX: 00000000ffffffff RCX: 0000000000000001
> > [    0.000000] RDX: 0000000000000000 RSI: ffff880000019800 RDI: ffff880000019800
> > [    0.000000] RBP: ffffffff82787ef8 R08: 0000000000000000 R09: 0000000000000000
> > [    0.000000] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff827cf880
> > [    0.000000] R13: 0000000000000002 R14: ffffffff827cf880 R15: 00000000001cd000
> > [    0.000000] FS:  0000000000000000(0000) GS:ffff88003f800000(0000) knlGS:0000000000000000
> > [    0.000000] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > [    0.000000] CR2: 0000000000000000 CR3: 000000000279e000 CR4: 00000000000006b0
> > [    0.000000] Stack:
> > [    0.000000]  ffffffff82787f50 ffffffff82c876f0 ffffffff83b7e7a0 0000000000000001
> > [    0.000000]  0000000000000082 0000000200000000 00000000ffffffff ffffffff82d37920
> > [    0.000000]  ffff88003ffbba40 ffffffff82d3e890 0000000000000000 ffffffff82787f80
> > [    0.000000] Call Trace:
> > [    0.000000]  [<ffffffff82c876f0>] rcu_init_one+0x4c0/0x55d
> > [    0.000000]  [<ffffffff82c87b00>] rcu_init+0x270/0x2da
> > [    0.000000]  [<ffffffff82c6ec82>] start_kernel+0x24f/0x4d2
> > [    0.000000]  [<ffffffff82c6e841>] ? set_init_arg+0x53/0x53
> > [    0.000000]  [<ffffffff82c6e453>] x86_64_start_reservations+0x2a/0x2c
> > [    0.000000]  [<ffffffff82c6e546>] x86_64_start_kernel+0xf1/0xf4
> > [    0.000000] Code: 4c 89 04 c7 4d 09 c1 48 ff c0 eb e8 31 c0 4d 85 c9 0f 95 c0 5d c3 55 48 63 c9 31 c0 48 83 c1 3f 48 89 e5 48 c1 e9 06 39 c1 7e 11 <4c> 8b 04 c2 4c 0b 04 c6 4c 89 04 c7 48 ff c0 eb eb 5d c3 55 48 
> > [    0.000000] RIP  [<ffffffff815b65fa>] __bitmap_or+0x15/0x28
> > [    0.000000]  RSP <ffffffff82787ef8>
> > [    0.000000] CR2: 0000000000000000
> > [    0.000000] ---[ end trace 4650963e4118800a ]---
> > [    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
> > [    0.000000] Rebooting in 1 seconds..Press any key to enter the menu
> > 
> > Excluding the new RCU bits from tip:master makes it boot.
> > 
> > Any idea what's wrong?
> 
> Looks like you have a setup that has NO_HZ_FULL=y, but that somehow
> avoids having a non-NULL tick_nohz_full_mask at rcu_init() time.  But you
> probably knew that already.  And of course when I test locally with the
> same RCU-related and NO_HZ_FULL-related configs, it all works just fine.
> Perhaps there is some interaction with some other code in -tip.
> 
> So let's see...
> 
> Now your .config has CONFIG_NO_HZ_FULL_ALL=y and therefore also has
> CONFIG_RCU_NOCB_CPU_ALL=y.  In that case, there is no point in doing the
> cpumask_or() because all the bits are already set.  So the only time
> that this cpumask_or() matters is when CONFIG_RCU_NOCB_CPU_ALL==n and
> an explicit nohz_full= mask was specified at boot time.  So one reasonable
> change is to replace the #ifndef guarding the cpumask_or() with:
> 
> 	#if defined(CONFIG_NO_HZ_FULL) && !defined(CONFIG_RCU_NOCB_CPU_ALL)
> 
> I am now looking to see how tick_nohz_full_mask might be NULL in this
> situation.  Depending on what I find, I might insert a check for that.

And of course if you don't actually specify a nohz_full= mask, then
tick_nohz_full_mask can be NULL at RCU initialization time, and if it
is also true that CONFIG_NO_HZ_FULL_ALL=n, this condition can persist
forever.

Does the following patch on top of 1823172ab582 (Merge branches
'doc.2014.07.08a', 'fixes.2014.07.09a', 'maintainers.2014.07.08b',
'nocbs.2014.07.07a' and 'torture.2014.07.07a' into HEAD) fix the
problem?

This may also be pulled from rcu/urgent in the -rcu git tree:

	git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git

							Thanx, Paul

------------------------------------------------------------------------

 b/kernel/rcu/tree_plugin.h |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

rcu: Allow for NULL tick_nohz_full_mask when nohz_full= missing

If there isn't a nohz_full= kernel parameter specified, then
tick_nohz_full_mask can legitimately be NULL.  This can cause
problems when RCU's boot code tries to cpumask_or() this value into
rcu_nocb_mask.  In addition, if NO_HZ_FULL_ALL=y, there is no point
in doing the cpumask_or() in the first place because this will cause
RCU_NOCB_CPU_ALL=y, which in turn will have all bits already set in
rcu_nocb_mask.

This commit therefore avoids the cpumask_or() if NO_HZ_FULL_ALL=y
and checks for NULL tick_nohz_full_mask otherwise.

Signed-off-by: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index f62b7f2f6abd..0f9ca12eabb9 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2479,9 +2479,10 @@ static void __init rcu_spawn_nocb_kthreads(struct rcu_state *rsp)
 
 	if (rcu_nocb_mask == NULL)
 		return;
-#ifdef CONFIG_NO_HZ_FULL
-	cpumask_or(rcu_nocb_mask, rcu_nocb_mask, tick_nohz_full_mask);
-#endif /* #ifdef CONFIG_NO_HZ_FULL */
+#if defined(CONFIG_NO_HZ_FULL) && !defined(CONFIG_NO_HZ_FULL_ALL)
+	if (tick_nohz_full_mask)
+		cpumask_or(rcu_nocb_mask, rcu_nocb_mask, tick_nohz_full_mask);
+#endif /* #if defined(CONFIG_NO_HZ_FULL) && !defined(CONFIG_NO_HZ_FULL_ALL) */
 	if (ls == -1) {
 		ls = int_sqrt(nr_cpu_ids);
 		rcu_nocb_leader_stride = ls;

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ