linux-kernel - Re: rcu_preempt detected stalls.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20141023204418.GG4977@linux.vnet.ibm.com>
Date:	Thu, 23 Oct 2014 13:44:18 -0700
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Dave Jones <davej@...hat.com>,
	Linux Kernel <linux-kernel@...r.kernel.org>, htejun@...il.com,
	oleg@...hat.com
Subject: Re: rcu_preempt detected stalls.

On Thu, Oct 23, 2014 at 04:28:16PM -0400, Dave Jones wrote:
> On Thu, Oct 23, 2014 at 12:52:21PM -0700, Paul E. McKenney wrote:
>  > On Thu, Oct 23, 2014 at 03:37:59PM -0400, Dave Jones wrote:
>  > > On Thu, Oct 23, 2014 at 12:28:07PM -0700, Paul E. McKenney wrote:
>  > > 
>  > >  > >  > This one will require more looking.  But did you do something like
>  > >  > >  > create a pair of mutually recursive symlinks or something?  ;-)
>  > >  > > 
>  > >  > > I'm not 100% sure, but this may have been on a box that I was running
>  > >  > > tests on NFS. So maybe the server had disappeared with the mount
>  > >  > > still active..
>  > >  > > 
>  > >  > > Just a guess tbh.
>  > >  > 
>  > >  > Another possibility might be that the box was so overloaded that tasks
>  > >  > were getting preempted for 21 seconds as a matter of course, and sometimes
>  > >  > within RCU read-side critical sections.  Or did the box have ample idle
>  > >  > time?
>  > > 
>  > > I fairly recently upped the number of child processes I typically run
>  > > with, so it being overloaded does sound highly likely.
>  > 
>  > Ah, that could do it!  One way to test extreme loads and not trigger
>  > RCU CPU stall warnings might be to make all of your child processes all
>  > sleep during a given interval of a few hundred milliseconds during each
>  > ten-second interval.  Would that work for you?
> 
> This feels like hiding from the problem rather than fixing it.
> I'm not sure it even makes sense to add sleeps to the fuzzer, other than
> to slow things down, and if I were to do that, I may as well just run
> it with fewer threads instead.

I was thinking of the RCU CPU stall warnings that were strictly due to
overload as being false positives.  If trinity caused a kthread to loop
within an RCU read-side critical section, you would still get the RCU
CPU stall warning even with the sleeps.

But just a suggestion, no strong feelings.  Might change if there is an
excess of false-positive RCU CPU stall warnings, of course.  ;-)

> While the fuzzer is doing pretty crazy stuff, what's different about it
> from any other application that overcommits the CPU with too many threads?

The (presumably) much higher probability of being preempted in the kernel,
and thus within an RCU read-side critical section.

> We impose rlimits to stop people from forkbombing and the like, but this
> doesn't even need that many processes to trigger, and with some effort
> could probably done with even fewer if I found ways to keep other cores
> busy in the kernel for long enough.
> 
> That all said, I don't have easy reproducers for this right now, due
> to other bugs manifesting long before this gets to be a problem.

Fair enough!  ;-)

							Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/