linux-kernel - Re: RCU qsmask !=0 warnings on large-SMP...

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4F200F4D.5000201@numascale.com>
Date:	Wed, 25 Jan 2012 15:18:53 +0100
From:	Steffen Persvold <sp@...ascale.com>
To:	paulmck@...ux.vnet.ibm.com
CC:	Daniel J Blueman <daniel@...ascale-asia.com>,
	Dipankar Sarma <dipankar@...ibm.com>,
	linux-kernel@...r.kernel.org, x86@...nel.org
Subject: Re: RCU qsmask !=0 warnings on large-SMP...

On 1/25/2012 15:00, Paul E. McKenney wrote:
> On Wed, Jan 25, 2012 at 05:44:34PM +0800, Daniel J Blueman wrote:
[]
> 
> I do test three-level trees regularly, but on relatively small systems
> by limiting CONFIG_RCU_FANOUT.  If you are running 64-bit systems
> with default values of CONFIG_RCU_FANOUT, a 1024-CPU system fits into a
> two-level tree, so I would guess that you either have more than 1024 CPUs
> (at least as far as NR_CPUS is concerned) or gaps in your CPU-numbering
> scheme.

NR_CPU is set to 4096 (i.e CONFIG_MAXSMP=y) and CONFIG_FCU_FANOUT is 64. We see 3 levels of RCU nodes with this configuration.

> 
>> Has this been encountered previously?
> 
> I haven't seen that warning since I was originally developing
> TREE_PREEMPT_RCU some years back.
> 
> Numascale NumaConnect is cache-coherent, right?

Correct.

[]
>>
>> WARNING: at kernel/rcutree_plugin.h:990
> 
> OK, if I have the right version, RCU-sched is trying to do a new grace
> period, but finds that one of the rcu_node structure's ->qsmask fields
> indicates that a CPU (or group of CPUs) failed to check into the previous
> grace period.
> 
[]

>>
>> CPU 48, treason uncloaked, rsp @ ffffffff81a1d000 (rcu_sched),
>> rsp->gpnum is 4058, rsp->completed is 4057, rnp @ ffffffff81a1d000,
>> qsmask is 0x1
>>    0 ffff8803f840d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=0 ri=279
>> ql=1 qs=..W. b=10 ci=145949 co=0 ca=0
>>   12 ffff880bd040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=7 ri=10
>> ql=2 qs=.R.. b=10 ci=561 co=0 ca=0
>>   24 ffff8813d040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=6 ri=14
>> ql=0 qs=.... b=10 ci=406 co=0 ca=0
>>   36 ffff881bd040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=5 ri=22
>> ql=0 qs=.... b=10 ci=215 co=0 ca=0
>>   48 ffff8823d040d4e0 c=4057 g=4057 pq=1 pgp=4057 qp=0 of=6 ri=14
>> ql=26 qs=.RWD b=10 ci=7345 co=0 ca=0
> 
> And it looks like CPU 48 is the one that did not check in.  What is
> CONFIG_NO_HZ set to?  If CONFIG_NO_HZ=y, this can legitimately
> happen, and printing the values of rdp->dynticks->dynticks,
> rdp->dynticks->dynticks_nesting, rdp->dynticks->dynticks_nmi_nesting,
> and rdp->dynticks_fqs will determine whether or not the situation is
> legitimate or not.
> 
> If CONFIG_NO_HZ=n, than this should not happen.  In this case, I should
> be able to provide you with a more focused debug print.

CONFIG_NO_HZ is not set, so it should not happen. We see that the behavior is the same with CONFIG_NO_HZ=y though, but it takes longer to reproduce usually.

> 
>>   60 ffff882bd040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=6 ri=16
>> ql=0 qs=.... b=10 ci=50 co=0 ca=0
>>   72 ffff8833d040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=6 ri=26
>> ql=0 qs=.... b=10 ci=22 co=0 ca=0
[]

>> CPU 48, treason uncloaked, rsp @ ffffffff81a1d000 (rcu_sched),
>> rsp->gpnum is 4058, rsp->completed is 4057, rnp @ ffffffff81a1d100,
>> qsmask is 0x8
>>    0 ffff8803f840d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=0 ri=279
>> ql=1 qs=..W. b=10 ci=145949 co=0 ca=0
>>   12 ffff880bd040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=7 ri=10
>> ql=3 qs=NR.. b=10 ci=561 co=0 ca=0
>>   24 ffff8813d040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=6 ri=14
>> ql=0 qs=.... b=10 ci=406 co=0 ca=0
>>   36 ffff881bd040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=5 ri=22
>> ql=0 qs=.... b=10 ci=215 co=0 ca=0
>>   48 ffff8823d040d4e0 c=4057 g=4057 pq=1 pgp=4057 qp=0 of=6 ri=19
>> ql=26 qs=.RWD b=10 ci=7345 co=0 ca=0
> 
> Hmmmm...  Same CPU and grace-period number.  This is a repeat of the
> above printout, correct?

Because the RCU tree is 3 levels, the printout function we added in the patch gets called 3 times each time with the same RDP but with different RNPs (in rcu_start_gp()).

> 
>>   60 ffff882bd040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=6 ri=16
>> ql=0 qs=.... b=10 ci=50 co=0 ca=0
>>   72 ffff8833d040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=6 ri=26
>> ql=0 qs=.... b=10 ci=22 co=0 ca=0
>> CPU 48, treason uncloaked, rsp @ ffffffff81a1d000 (rcu_sched),
>> rsp->gpnum is 4058, rsp->completed is 4057, rnp @ ffffffff81a1d800,
>> qsmask is 0x1
>>    0 ffff8803f840d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=0 ri=280
>> ql=1 qs=..W. b=10 ci=145949 co=0 ca=0
>>   12 ffff880bd040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=7 ri=11
>> ql=3 qs=NR.. b=10 ci=561 co=0 ca=0
>>   24 ffff8813d040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=6 ri=15
>> ql=0 qs=.... b=10 ci=406 co=0 ca=0
>>   36 ffff881bd040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=5 ri=23
>> ql=0 qs=.... b=10 ci=215 co=0 ca=0
>>   48 ffff8823d040d4e0 c=4057 g=4057 pq=1 pgp=4057 qp=0 of=6 ri=21
>> ql=26 qs=.RWD b=10 ci=7345 co=0 ca=0
> 
> Same here, but most of the ql= values are larger.  Later printout?

The loop in rcu_start_gp() releases the node lock between each time it gets a new level in the RCU tree (it has to) :

	rcu_for_each_node_breadth_first(rsp, rnp) {
		raw_spin_lock(&rnp->lock);	/* irqs already disabled. */
		rcu_debug_print(rsp, rnp);

so I guess this allows ql= values to increase maybe, no ?

> 
> 							Thanx, Paul
> 
>>   60 ffff882bd040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=6 ri=16
>> ql=0 qs=.... b=10 ci=50 co=0 ca=0
>>   72 ffff8833d040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=6 ri=26
>> ql=0 qs=.... b=10 ci=22 co=0 ca=0
>>

Thanks for looking into this Paul, we'd be more than happy to test out theories and patches.


Kind regards,
-- 
Steffen Persvold, Chief Architect NumaChip
Numascale AS - www.numascale.com
Tel: +47 92 49 25 54 Skype: spersvold
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/