linux-kernel - RCU used on incoming CPU before rcu_cpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Date:   Wed, 8 Mar 2017 14:16:56 -0800
From:   "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:     linux-kernel@...r.kernel.org
Cc:     mingo@...hat.com, peterz@...radead.org, fweisbec@...il.com,
        tglx@...utronix.de
Subject: RCU used on incoming CPU before rcu_cpu_starting() called

Hello!

I am seeing the following splat in rcutorture testing of v4.11-rc1:

[   30.694013] =============================
[   30.694013] WARNING: suspicious RCU usage
[   30.694013] 4.11.0-rc1+ #1 Not tainted
[   30.694013] -----------------------------
[   30.694013] /home/git/linux-2.6-tip/kernel/workqueue.c:712 sched RCU or wq_pool_mutex should be held!
[   30.694013] 
[   30.694013] other info that might help us debug this:
[   30.694013] 
[   30.694013] 
[   30.694013] RCU used illegally from offline CPU!
[   30.694013] rcu_scheduler_active = 2, debug_locks = 0
[   30.694013] no locks held by swapper/1/0.
[   30.694013] 
[   30.694013] stack backtrace:
[   30.694013] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.11.0-rc1+ #1
[   30.694013] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[   30.694013] Call Trace:
[   30.694013]  dump_stack+0x67/0x99
[   30.694013]  lockdep_rcu_suspicious+0xe7/0x120
[   30.694013]  get_work_pool+0x82/0x90
[   30.694013]  __queue_work+0x70/0x5f0
[   30.694013]  queue_work_on+0x33/0x70
[   30.694013]  clear_sched_clock_stable+0x33/0x40
[   30.694013]  early_init_intel+0xe7/0x2f0
[   30.694013]  init_intel+0x11/0x350
[   30.694013]  identify_cpu+0x344/0x5a0
[   30.694013]  identify_secondary_cpu+0x18/0x80
[   30.694013]  smp_store_cpu_info+0x39/0x40
[   30.694013]  start_secondary+0x4e/0x100
[   30.694013]  start_cpu+0x14/0x14

Here is the relevant code from x86's smp_callin():

	/*
	 * Save our processor parameters. Note: this information
	 * is needed for clock calibration.
	 */
	smp_store_cpu_info(cpuid);

	/*
	 * Get our bogomips.
	 * Update loops_per_jiffy in cpu_data. Previous call to
	 * smp_store_cpu_info() stored a value that is close but not as
	 * accurate as the value just calculated.
	 */
	calibrate_delay();
	cpu_data(cpuid).loops_per_jiffy = loops_per_jiffy;
	pr_debug("Stack at about %p\n", &cpuid);

	/*
	 * This must be done before setting cpu_online_mask
	 * or calling notify_cpu_starting.
	 */
	set_cpu_sibling_map(raw_smp_processor_id());
	wmb();

	notify_cpu_starting(cpuid);

The problem is that smp_store_cpu_info() indirectly invokes
schedule_work(), which wants to use RCU.  But RCU isn't informed
of the incoming CPU until the call to notify_cpu_starting(), which
causes lockdep to complain bitterly about the use of RCU by the
premature call to schedule_work().

I considered just moving the notify_cpu_starting() earlier in the
sequence, but the comments make it seem like this would not be
a wise choice.

Any suggestions?

						Thanx, Paul