linux-kernel - Re: RCU qsmask !=0 warnings on large-SMP...

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120126192653.GC2437@linux.vnet.ibm.com>
Date:	Thu, 26 Jan 2012 11:26:53 -0800
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Steffen Persvold <sp@...ascale.com>
Cc:	Daniel J Blueman <daniel@...ascale-asia.com>,
	Dipankar Sarma <dipankar@...ibm.com>,
	linux-kernel@...r.kernel.org, x86@...nel.org
Subject: Re: RCU qsmask !=0 warnings on large-SMP...

On Thu, Jan 26, 2012 at 04:04:37PM +0100, Steffen Persvold wrote:
> On 1/26/2012 02:58, Paul E. McKenney wrote:
> >On Wed, Jan 25, 2012 at 11:48:58PM +0100, Steffen Persvold wrote:
> []
> >
> >This looks like it will produce useful information, but I am not seeing
> >output from it below.
> >
> >							Thanx, Paul
> >
> >>This run it was CPU24 that triggered the issue :
> >>
> 
> This line is the printout for the root level :
> 
> >>[  231.572688] CPU 24, treason uncloaked, rsp @ ffffffff81a1cd80 (rcu_sched), rnp @ ffffffff81a1cd80(r) qsmask=0x1f, c=5132 g=5132 nc=5132 ng=5133 sc=5132 sg=5133 mc=5132 mg=5133

OK, so the rcu_state structure (sc and sg) believes that grace period
5133 has started but not completed, as expected.  Strangely enough, so
does the root rcu_node structure (nc and ng) and the CPU's leaf rcu_node
structure (mc and mg).

The per-CPU rcu_data structure (c and g) does not yet know about the
new 5133 grace period, as expected.

So this is the code in kernel/rcutree.c:rcu_start_gp() that does the
initialization:

	rcu_for_each_node_breadth_first(rsp, rnp) {
		raw_spin_lock(&rnp->lock);	/* irqs already disabled. */
		rcu_preempt_check_blocked_tasks(rnp);
		rnp->qsmask = rnp->qsmaskinit;
		rnp->gpnum = rsp->gpnum;
		rnp->completed = rsp->completed;
		if (rnp == rdp->mynode)
			rcu_start_gp_per_cpu(rsp, rnp, rdp);
		rcu_preempt_boost_start_gp(rnp);
		trace_rcu_grace_period_init(rsp->name, rnp->gpnum,
					    rnp->level, rnp->grplo,
					    rnp->grphi, rnp->qsmask);
		raw_spin_unlock(&rnp->lock);	/* irqs remain disabled. */
	}

I am assuming that your debug prints are still invoked right after
the raw_spin_lock() above.  If so, I would expect nc==ng and mc==mg.
Even if your debug prints followed the assignments to rnp->gpnum and
rnp->completed, I would expect mc==mg for the root and internal rcu_node
structures.  But you say below that you get the same values throughout,
and in that case, I would expect the leaf rcu_node structure to show
something different than the root and internal structures.

The code really does hold the root rcu_node lock at all calls to
rcu_gp_start(), so I don't see how we could be getting two CPUs in that
code at the same time, which would be one way that the rcu_node and
rcu_data structures might get advance notice of the new grace period,
but in that case, you would have more than one bit set in ->qsmask.

So, any luck with the trace events for rcu_grace_period and
rcu_grace_period_init?

							Thanx, Paul

> (this is the WARN_ON printout) :
> >>[  231.576167] ------------[ cut here ]------------
> >>[  231.576167] WARNING: at kernel/rcutree_plugin.h:1011 rcu_preempt_check_blocked_tasks+0x27/0x30()
> >>[  231.576167] Hardware name: H8QI6
> >>[  231.576167] Modules linked in: rcutorture
> >>[  231.576167] Pid: 4603, comm: rcu_torture_rea Not tainted 3.2.1-numaconnect10+ #77
> >>[  231.576167] Call Trace:
> >>[  231.576167]<IRQ>   [<ffffffff810bb217>] ? rcu_preempt_check_blocked_tasks+0x27/0x30
> >>[  231.576167]  [<ffffffff8106f47b>] warn_slowpath_common+0x8b/0xc0
> >>[  231.576167]  [<ffffffff8106f4c5>] warn_slowpath_null+0x15/0x20
> >>[  231.576167]  [<ffffffff810bb217>] rcu_preempt_check_blocked_tasks+0x27/0x30
> >>[  231.576167]  [<ffffffff810bb330>] rcu_start_gp+0x110/0x1b0
> >>[  231.576167]  [<ffffffff810bbf3b>] __rcu_process_callbacks+0x8b/0xd0
> >>[  231.576167]  [<ffffffff810bc7a0>] rcu_process_callbacks+0x20/0x40
> >>[  231.576167]  [<ffffffff8107580d>] __do_softirq+0x9d/0x140
> >>[  231.576167]  [<ffffffff815d982c>] call_softirq+0x1c/0x30
> >>[  231.576167]  [<ffffffff8103451a>] do_softirq+0x4a/0x80
> >>[  231.576167]  [<ffffffff81075b83>] irq_exit+0x43/0x60
> >>[  231.576167]  [<ffffffff8104aed5>] smp_apic_timer_interrupt+0x45/0x60
> >>[  231.576167]  [<ffffffff815d834b>] apic_timer_interrupt+0x6b/0x70
> >>[  231.576167]<EOI>   [<ffffffff81067aa9>] ? finish_task_switch+0x59/0xc0
> >>[  231.576167]  [<ffffffff815d4d37>] __schedule+0x337/0x710
> >>[  231.576167]  [<ffffffff81090425>] ? sched_clock_local+0x15/0x80
> >>[  231.576167]  [<ffffffff8107b826>] ? lock_timer_base+0x36/0x70
> >>[  231.576167]  [<ffffffff8107baa2>] ? mod_timer+0xf2/0x1d0
> >>[  231.576167]  [<ffffffffa0001510>] ? rcu_torture_shuffle+0x80/0x80 [rcutorture]
> >>[  231.576167]  [<ffffffff815d53ea>] schedule+0x3a/0x60
> >>[  231.576167]  [<ffffffffa0001640>] rcu_torture_reader+0x130/0x230 [rcutorture]
> >>[  231.576167]  [<ffffffffa0001dc0>] ? rcu_torture_writer+0x160/0x160 [rcutorture]
> >>[  231.576167]  [<ffffffffa0001510>] ? rcu_torture_shuffle+0x80/0x80 [rcutorture]
> >>[  231.576167]  [<ffffffff8108a726>] kthread+0x96/0xa0
> >>[  231.576167]  [<ffffffff815d9734>] kernel_thread_helper+0x4/0x10
> >>[  231.576167]  [<ffffffff8108a690>] ? kthread_stop+0x70/0x70
> >>[  231.576167]  [<ffffffff815d9730>] ? gs_change+0xb/0xb
> >>[  231.576167] ---[ end trace 828c8d7afbd02d1b ]---
> >>
> 
> I didn't include the leaf node printout, but the counters were
> indentical to the root printout (with the exception of the rnp
> address and qsmask of course).
> 
> Cheers,
> -- 
> Steffen Persvold, Chief Architect NumaChip
> Numascale AS - www.numascale.com
> Tel: +47 92 49 25 54 Skype: spersvold
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/