lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160115231410.GA16973@linux.vnet.ibm.com>
Date:	Fri, 15 Jan 2016 15:14:10 -0800
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Thomas Gleixner <tglx@...utronix.de>
Cc:	Sasha Levin <sasha.levin@...cle.com>,
	LKML <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <peterz@...radead.org>
Subject: Re: timers: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected

On Fri, Jan 15, 2016 at 02:10:45PM -0800, Paul E. McKenney wrote:
> On Fri, Jan 15, 2016 at 01:11:25PM -0800, Paul E. McKenney wrote:
> > On Fri, Jan 15, 2016 at 11:03:24AM +0100, Thomas Gleixner wrote:
> > > On Thu, 14 Jan 2016, Paul E. McKenney wrote:
> > > > > Untested patch below.
> > > > 
> > > > One small fix to make it build below.  Started rcutorture, somewhat
> > > > pointlessly given that the splat doesn't appear on my setup.
> > > 
> > > Well, at least it tells us whether the change explodes by itself.
> > 
> > Hmmm...
> > 
> > So this is a strange one.  I have been seeing increasing instability
> > in mainline over the past couple of releases, with the main symptom
> > being that the kernel decides that awakening RCU's grace-period kthreads
> > is an optional activity.  The usual situation is that the kthread is
> > blocked for tens of seconds in an wait_event_interruptible_timeout(),
> > despite having a three-jiffy timeout.  Doing periodic wakeups from
> > the scheduling-clock interrupt seems to clear things up, but such hacks
> > should not be necessary.
> > 
> > Normally, I have to run for for some hours to have a good chance of seeing
> > this happen.  This change triggered in a 30-minute run.  Not only that,
> > but in a .config scenario that is normally very hard to trigger.  This
> > scenario does involve CPU hotplug, and I am re-running with CPU hotplug
> > disabled.
> > 
> > That said, I am starting to hear reports of people hitting this without
> > CPU hotplug operations...
> 
> And without hotplug operations, instead of dying repeatedly in 30 minutes,
> it goes four hours with no complaints.  Next trying wakeups.

And if I make the scheduling-clock interrupt send extra wakeups to the RCU
grace-period kthread when needed, things work even with CPU hotplug going.

The "when needed" means any time that the RCU grace-period kthread has
been sleeping three times as long as the timeout interval.  If the first
wakeup does nothing, it does another wakeup once per second.

So it looks like this change makes an existing problem much worse, as
opposed to introducing a new problem.

							Thanx, Paul

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ