[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <AANLkTiktOrExWPkk1zG=Y7wk9-RAAFC61eRMjtv6vgLB@mail.gmail.com>
Date: Tue, 29 Mar 2011 04:42:31 +0200
From: Sedat Dilek <sedat.dilek@...glemail.com>
To: paulmck@...ux.vnet.ibm.com
Cc: Josh Triplett <josh@...htriplett.org>,
linux-next <linux-next@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>,
Stephen Rothwell <sfr@...b.auug.org.au>,
Randy Dunlap <randy.dunlap@...cle.com>,
"Theodore Ts'o" <tytso@....edu>, Jens Axboe <axboe@...nel.dk>,
Tejun Heo <tj@...nel.org>, Al Viro <viro@...iv.linux.org.uk>,
Nick Piggin <npiggin@...nel.dk>
Subject: Re: linux-next: Tree for March 25 (Call trace: RCU|workqueues|block|VFS|ext4
related?)
On Tue, Mar 29, 2011 at 2:10 AM, Paul E. McKenney
<paulmck@...ux.vnet.ibm.com> wrote:
> On Mon, Mar 28, 2011 at 06:46:48PM +0200, Sedat Dilek wrote:
>> On Mon, Mar 28, 2011 at 6:38 PM, Sedat Dilek <sedat.dilek@...glemail.com> wrote:
>> > On Mon, Mar 28, 2011 at 5:11 PM, Paul E. McKenney
>> > <paulmck@...ux.vnet.ibm.com> wrote:
>> >> On Mon, Mar 28, 2011 at 06:24:36AM -0700, Paul E. McKenney wrote:
>> >>> On Mon, Mar 28, 2011 at 02:33:36PM +0200, Sedat Dilek wrote:
>
> [ . . . ]
>
>> >>> > Ah, before I forget...
>> >>> >
>> >>> > I used TREE_RCU (was the default before noticing RCU issue) for
>> >>> > finding the culprit commit.
>> >>> > If it is from your POV more helpful to switch to PREEMPT + PREEMPT_RCU
>> >>> > + RCU_BOOST, please let me *now* know.
>> >>> > ( Both RCU setups freaks up the system. )
>> >>>
>> >>> If TREE_RCU hits problems faster, it is probably best to stay with
>> >>> TREE_RCU.
>> >>
>> >> And of course, one exception to this advice is if TREE_RCU hangs so hard
>> >> and fast that you don't have time to get any diagnostics. If this is the
>> >> case, then TREE_PREEMPT_RCU might be more productive.
>> >>
>> >
>> > OK, that would somehow explain why I could not really get some debug
>> > infos when doing "my stress-test" and checking via:
>> >
>> > $ LC_ALL=C tail -f /sys/kernel/debug/rcu/rcudata
>> >
>> > Then I remembered I saw a snippet for a RCU torture script mentionned
>> > in the kernel-docs (see Documentation/RCU/torture.txt).
>> >
>> > 189 The following script may be used to torture RCU:
>> > 190
>> > 191 #!/bin/sh
>> > 192
>> > 193 modprobe rcutorture
>> > 194 sleep 100
>> > 195 rmmod rcutorture
>> > 196 dmesg | grep torture:
>> >
>> > So, I recompiled a new TREE_RC-based kernel and build with
>> > CONFIG_RCU_TORTURE_TEST=m.
>> >
>> > Unfortunately, the rmmod (I prefer modprobe -r -v) hangs... the
>> > messages in the logs look promising.
>> >
>> > - Sedat -
>> >
>>
>> Wrong attachment, correct attached.
>
> And one stupid problem located thus far. I can make a (tortured) case
> for it resulting in the symptoms you see, but it does seem unlikely to
> happen repeatedly, as it would require a burst of CPU just at the wrong
> time. But who knows?
>
> In any case, I am still looking.
>
> Thanx, Paul
>
> ------------------------------------------------------------------------
>
> Fix stupid typo.
>
> Signed-off-by: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
>
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index 5477764..f311228 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -1618,7 +1618,7 @@ static int rcu_node_kthread(void *arg)
> rnp->wakemask = 0;
> raw_spin_unlock_irqrestore(&rnp->lock, flags);
> rcu_initiate_boost(rnp);
> - for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask <<= 1) {
> + for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask >>= 1) {
> if ((mask & 0x1) == 0)
> continue;
> preempt_disable();
>
I have tested this patch and the previous one you send:
(+) OK rcu-fix/rcu-further-lower-priority-in-rcu_yield.patch
(+) OK rcu-fix/Fix-stupid-typo.patch
As you suggested I switched to PREEMPT and RCU with rcu-boost:
# egrep 'RCU|PREEMPT|_HZ' /boot/config-2.6.38-next20110328-5-686-iniza
# RCU Subsystem
CONFIG_TREE_PREEMPT_RCU=y
CONFIG_PREEMPT_RCU=y
CONFIG_RCU_TRACE=y
CONFIG_RCU_FANOUT=32
# CONFIG_RCU_FANOUT_EXACT is not set
CONFIG_TREE_RCU_TRACE=y
CONFIG_RCU_BOOST=y
CONFIG_RCU_BOOST_PRIO=1
CONFIG_RCU_BOOST_DELAY=500
CONFIG_NO_HZ=y
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
# CONFIG_HZ_100 is not set
CONFIG_HZ_250=y
# CONFIG_HZ_300 is not set
# CONFIG_HZ_1000 is not set
CONFIG_HZ=250
CONFIG_DEBUG_PREEMPT=y
# CONFIG_SPARSE_RCU_POINTER is not set
CONFIG_RCU_TORTURE_TEST=m
CONFIG_RCU_CPU_STALL_TIMEOUT=60
CONFIG_RCU_CPU_STALL_VERBOSE=y
CONFIG_PREEMPT_TRACER=y
Unfortunately, the rcu-torture script hangs again on unloading the
rcu-torture-test module.
Attached are RCU-related messages in my logs.
( I tailed for rcudata changes - no logs. )
- Sedat -
View attachment "msg_rcu-torture_20110329.txt" of type "text/plain" (26844 bytes)
Powered by blists - more mailing lists