lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20170308234152.GA14819@linux.vnet.ibm.com>
Date:   Wed, 8 Mar 2017 15:41:52 -0800
From:   "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:     linux-kernel@...r.kernel.org
Cc:     mingo@...hat.com, peterz@...radead.org, fweisbec@...il.com,
        tglx@...utronix.de
Subject: Re: RCU used on incoming CPU before rcu_cpu_starting() called

On Wed, Mar 08, 2017 at 02:16:56PM -0800, Paul E. McKenney wrote:
> Hello!
> 
> I am seeing the following splat in rcutorture testing of v4.11-rc1:
> 
> [   30.694013] =============================
> [   30.694013] WARNING: suspicious RCU usage
> [   30.694013] 4.11.0-rc1+ #1 Not tainted
> [   30.694013] -----------------------------
> [   30.694013] /home/git/linux-2.6-tip/kernel/workqueue.c:712 sched RCU or wq_pool_mutex should be held!
> [   30.694013] 
> [   30.694013] other info that might help us debug this:
> [   30.694013] 
> [   30.694013] 
> [   30.694013] RCU used illegally from offline CPU!
> [   30.694013] rcu_scheduler_active = 2, debug_locks = 0
> [   30.694013] no locks held by swapper/1/0.
> [   30.694013] 
> [   30.694013] stack backtrace:
> [   30.694013] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.11.0-rc1+ #1
> [   30.694013] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> [   30.694013] Call Trace:
> [   30.694013]  dump_stack+0x67/0x99
> [   30.694013]  lockdep_rcu_suspicious+0xe7/0x120
> [   30.694013]  get_work_pool+0x82/0x90
> [   30.694013]  __queue_work+0x70/0x5f0
> [   30.694013]  queue_work_on+0x33/0x70
> [   30.694013]  clear_sched_clock_stable+0x33/0x40
> [   30.694013]  early_init_intel+0xe7/0x2f0
> [   30.694013]  init_intel+0x11/0x350
> [   30.694013]  identify_cpu+0x344/0x5a0
> [   30.694013]  identify_secondary_cpu+0x18/0x80
> [   30.694013]  smp_store_cpu_info+0x39/0x40
> [   30.694013]  start_secondary+0x4e/0x100
> [   30.694013]  start_cpu+0x14/0x14
> 
> Here is the relevant code from x86's smp_callin():
> 
> 	/*
> 	 * Save our processor parameters. Note: this information
> 	 * is needed for clock calibration.
> 	 */
> 	smp_store_cpu_info(cpuid);
> 
> 	/*
> 	 * Get our bogomips.
> 	 * Update loops_per_jiffy in cpu_data. Previous call to
> 	 * smp_store_cpu_info() stored a value that is close but not as
> 	 * accurate as the value just calculated.
> 	 */
> 	calibrate_delay();
> 	cpu_data(cpuid).loops_per_jiffy = loops_per_jiffy;
> 	pr_debug("Stack at about %p\n", &cpuid);
> 
> 	/*
> 	 * This must be done before setting cpu_online_mask
> 	 * or calling notify_cpu_starting.
> 	 */
> 	set_cpu_sibling_map(raw_smp_processor_id());
> 	wmb();
> 
> 	notify_cpu_starting(cpuid);
> 
> The problem is that smp_store_cpu_info() indirectly invokes
> schedule_work(), which wants to use RCU.  But RCU isn't informed
> of the incoming CPU until the call to notify_cpu_starting(), which
> causes lockdep to complain bitterly about the use of RCU by the
> premature call to schedule_work().
> 
> I considered just moving the notify_cpu_starting() earlier in the
> sequence, but the comments make it seem like this would not be
> a wise choice.
> 
> Any suggestions?

And I should hasten to mention my default approach, which would be
to provide a Kconfig option that causes notify_cpu_starting() to
skip calling rcu_cpu_starting(), and to add a call to this function
in x86's smp_callin() just before the call to smp_store_cpu_info().
But this seemed a bit crude, so I figured I should check with you
guys before taking that approach.

But what is the fun in that?  Please see below for the corresponding
lightly tested patch.  ;-)

							Thanx, Paul

------------------------------------------------------------------------

commit aa805dad9ae66cc4f5106d004b6bb5102fd84434
Author: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
Date:   Wed Mar 8 15:32:29 2017 -0800

    cpu: Move RCU earlier in x86 CPU-online procedure
    
    Running rcutorture on v4.11-rc1 results in the following splat on x86
    on kernels built with both CPU hotplug and lockdep:
    
    [   30.694013] =============================
    [   30.694013] WARNING: suspicious RCU usage
    [   30.694013] 4.11.0-rc1+ #1 Not tainted
    [   30.694013] -----------------------------
    [   30.694013] /home/git/linux-2.6-tip/kernel/workqueue.c:712 sched RCU or wq_pool_mutex should be held!
    [   30.694013]
    [   30.694013] other info that might help us debug this:
    [   30.694013]
    [   30.694013]
    [   30.694013] RCU used illegally from offline CPU!
    [   30.694013] rcu_scheduler_active = 2, debug_locks = 0
    [   30.694013] no locks held by swapper/1/0.
    [   30.694013]
    [   30.694013] stack backtrace:
    [   30.694013] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.11.0-rc1+ #1
    [   30.694013] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
    [   30.694013] Call Trace:
    [   30.694013]  dump_stack+0x67/0x99
    [   30.694013]  lockdep_rcu_suspicious+0xe7/0x120
    [   30.694013]  get_work_pool+0x82/0x90
    [   30.694013]  __queue_work+0x70/0x5f0
    [   30.694013]  queue_work_on+0x33/0x70
    [   30.694013]  clear_sched_clock_stable+0x33/0x40
    [   30.694013]  early_init_intel+0xe7/0x2f0
    [   30.694013]  init_intel+0x11/0x350
    [   30.694013]  identify_cpu+0x344/0x5a0
    [   30.694013]  identify_secondary_cpu+0x18/0x80
    [   30.694013]  smp_store_cpu_info+0x39/0x40
    [   30.694013]  start_secondary+0x4e/0x100
    [   30.694013]  start_cpu+0x14/0x14
    
    This is caused by the fact that smp_store_cpu_info() indirectly invokes
    schedule_work(), which wants to use RCU.  But RCU isn't informed of the
    incoming CPU until the later call to notify_cpu_starting(), which causes
    lockdep to complain bitterly about the use of RCU by the premature call
    to schedule_work().  It is said to be unwise to move the call to
    notify_cpu_starting() to precede that to smp_store_cpu_info(), so this
    commit adds a ARCH_RCU_ONLINE_EARLY Kconfig option that is selected by
    the x86 architecture.  This option suppresses the call to rcu_cpu_starting()
    from notify_cpu_starting(), and a call to rcu_cpu_starting() is added
    preceding the call to smp_store_cpu_info().
    
    Signed-off-by: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
    Cc: Ingo Molnar <mingo@...hat.com>
    Cc: Peter Zijlstra <peterz@...radead.org>
    Cc: Frederic Weisbecker <fweisbec@...il.com>
    Cc: Thomas Gleixner <tglx@...utronix.de>

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index cc98d5a294ee..bb42b76ec2dd 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -63,6 +63,7 @@ config X86
 	select ARCH_MIGHT_HAVE_ACPI_PDC		if ACPI
 	select ARCH_MIGHT_HAVE_PC_PARPORT
 	select ARCH_MIGHT_HAVE_PC_SERIO
+	select ARCH_RCU_ONLINE_EARLY
 	select ARCH_SUPPORTS_ATOMIC_RMW
 	select ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT
 	select ARCH_SUPPORTS_NUMA_BALANCING	if X86_64
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index bd1f1ad35284..ca7386d04194 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -188,6 +188,12 @@ static void smp_callin(void)
 	apic_ap_setup();
 
 	/*
+	 * Enable RCU readers on this CPU, needed for workqueues,
+	 * which are used in smp_store_cpu_info().
+	 */
+	rcu_cpu_starting(cpuid);
+
+	/*
 	 * Save our processor parameters. Note: this information
 	 * is needed for clock calibration.
 	 */
diff --git a/init/Kconfig b/init/Kconfig
index c859c993c26f..31c2d5e08f16 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -543,6 +543,9 @@ config RCU_STALL_COMMON
 	  the tiny variants to disable RCU CPU stall warnings, while
 	  making these warnings mandatory for the tree variants.
 
+config ARCH_RCU_ONLINE_EARLY
+	bool
+
 config CONTEXT_TRACKING
        bool
 
diff --git a/kernel/cpu.c b/kernel/cpu.c
index f7c063239fa5..9d98f2785fa0 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -851,7 +851,8 @@ void notify_cpu_starting(unsigned int cpu)
 	struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
 	enum cpuhp_state target = min((int)st->target, CPUHP_AP_ONLINE);
 
-	rcu_cpu_starting(cpu);	/* Enables RCU usage on this CPU. */
+	if (!IS_ENABLED(CONFIG_ARCH_RCU_ONLINE_EARLY))
+		rcu_cpu_starting(cpu);	/* Enables RCU usage on this CPU. */
 	while (st->state < target) {
 		st->state++;
 		cpuhp_invoke_callback(cpu, st->state, true, NULL);

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ