lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 29 Mar 2011 04:42:31 +0200
From:	Sedat Dilek <sedat.dilek@...glemail.com>
To:	paulmck@...ux.vnet.ibm.com
Cc:	Josh Triplett <josh@...htriplett.org>,
	linux-next <linux-next@...r.kernel.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Stephen Rothwell <sfr@...b.auug.org.au>,
	Randy Dunlap <randy.dunlap@...cle.com>,
	"Theodore Ts'o" <tytso@....edu>, Jens Axboe <axboe@...nel.dk>,
	Tejun Heo <tj@...nel.org>, Al Viro <viro@...iv.linux.org.uk>,
	Nick Piggin <npiggin@...nel.dk>
Subject: Re: linux-next: Tree for March 25 (Call trace: RCU|workqueues|block|VFS|ext4
 related?)

On Tue, Mar 29, 2011 at 2:10 AM, Paul E. McKenney
<paulmck@...ux.vnet.ibm.com> wrote:
> On Mon, Mar 28, 2011 at 06:46:48PM +0200, Sedat Dilek wrote:
>> On Mon, Mar 28, 2011 at 6:38 PM, Sedat Dilek <sedat.dilek@...glemail.com> wrote:
>> > On Mon, Mar 28, 2011 at 5:11 PM, Paul E. McKenney
>> > <paulmck@...ux.vnet.ibm.com> wrote:
>> >> On Mon, Mar 28, 2011 at 06:24:36AM -0700, Paul E. McKenney wrote:
>> >>> On Mon, Mar 28, 2011 at 02:33:36PM +0200, Sedat Dilek wrote:
>
> [ . . . ]
>
>> >>> > Ah, before I forget...
>> >>> >
>> >>> > I used TREE_RCU (was the default before noticing RCU issue) for
>> >>> > finding the culprit commit.
>> >>> > If it is from your POV more helpful to switch to PREEMPT + PREEMPT_RCU
>> >>> > + RCU_BOOST, please let me *now* know.
>> >>> > ( Both RCU setups freaks up the system. )
>> >>>
>> >>> If TREE_RCU hits problems faster, it is probably best to stay with
>> >>> TREE_RCU.
>> >>
>> >> And of course, one exception to this advice is if TREE_RCU hangs so hard
>> >> and fast that you don't have time to get any diagnostics.  If this is the
>> >> case, then TREE_PREEMPT_RCU might be more productive.
>> >>
>> >
>> > OK, that would somehow explain why I could not really get some debug
>> > infos when doing "my stress-test" and checking via:
>> >
>> > $ LC_ALL=C tail -f /sys/kernel/debug/rcu/rcudata
>> >
>> > Then I remembered I saw a snippet for a RCU torture script mentionned
>> > in the kernel-docs (see Documentation/RCU/torture.txt).
>> >
>> > 189 The following script may be used to torture RCU:
>> > 190
>> > 191         #!/bin/sh
>> > 192
>> > 193         modprobe rcutorture
>> > 194         sleep 100
>> > 195         rmmod rcutorture
>> > 196         dmesg | grep torture:
>> >
>> > So, I recompiled a new TREE_RC-based kernel and build with
>> > CONFIG_RCU_TORTURE_TEST=m.
>> >
>> > Unfortunately, the rmmod (I prefer modprobe -r -v) hangs... the
>> > messages in the logs look promising.
>> >
>> > - Sedat -
>> >
>>
>> Wrong attachment, correct attached.
>
> And one stupid problem located thus far.  I can make a (tortured) case
> for it resulting in the symptoms you see, but it does seem unlikely to
> happen repeatedly, as it would require a burst of CPU just at the wrong
> time.  But who knows?
>
> In any case, I am still looking.
>
>                                                        Thanx, Paul
>
> ------------------------------------------------------------------------
>
> Fix stupid typo.
>
> Signed-off-by: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
>
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index 5477764..f311228 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -1618,7 +1618,7 @@ static int rcu_node_kthread(void *arg)
>                rnp->wakemask = 0;
>                raw_spin_unlock_irqrestore(&rnp->lock, flags);
>                rcu_initiate_boost(rnp);
> -               for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask <<= 1) {
> +               for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask >>= 1) {
>                        if ((mask & 0x1) == 0)
>                                continue;
>                        preempt_disable();
>

I have tested this patch and the previous one you send:

  (+) OK   rcu-fix/rcu-further-lower-priority-in-rcu_yield.patch
  (+) OK   rcu-fix/Fix-stupid-typo.patch

As you suggested I switched to PREEMPT and RCU with rcu-boost:

# egrep 'RCU|PREEMPT|_HZ' /boot/config-2.6.38-next20110328-5-686-iniza
# RCU Subsystem
CONFIG_TREE_PREEMPT_RCU=y
CONFIG_PREEMPT_RCU=y
CONFIG_RCU_TRACE=y
CONFIG_RCU_FANOUT=32
# CONFIG_RCU_FANOUT_EXACT is not set
CONFIG_TREE_RCU_TRACE=y
CONFIG_RCU_BOOST=y
CONFIG_RCU_BOOST_PRIO=1
CONFIG_RCU_BOOST_DELAY=500
CONFIG_NO_HZ=y
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
# CONFIG_HZ_100 is not set
CONFIG_HZ_250=y
# CONFIG_HZ_300 is not set
# CONFIG_HZ_1000 is not set
CONFIG_HZ=250
CONFIG_DEBUG_PREEMPT=y
# CONFIG_SPARSE_RCU_POINTER is not set
CONFIG_RCU_TORTURE_TEST=m
CONFIG_RCU_CPU_STALL_TIMEOUT=60
CONFIG_RCU_CPU_STALL_VERBOSE=y
CONFIG_PREEMPT_TRACER=y

Unfortunately, the rcu-torture script hangs again on unloading the
rcu-torture-test module.
Attached are RCU-related messages in my logs.
( I tailed for rcudata changes - no logs. )

- Sedat -

View attachment "msg_rcu-torture_20110329.txt" of type "text/plain" (26844 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ