lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sun, 30 Sep 2012 19:23:08 +0800
From:	Fengguang Wu <fengguang.wu@...el.com>
To:	Avi Kivity <avi@...hat.com>
Cc:	paulmck@...ux.vnet.ibm.com, Josh Boyer <jwboyer@...hat.com>,
	Christian Hoffmann <email@...istianhoffmann.info>,
	LKML <linux-kernel@...r.kernel.org>,
	"kvm@...r.kernel.org" <kvm@...r.kernel.org>, johnstul@...ibm.com,
	tglx@...utronix.de
Subject: Re: INFO: rcu_preempt detected stalls on CPUs/tasks: { 1} (detected
 by 0, t=10002 jiffies)

On Sun, Sep 30, 2012 at 01:10:55PM +0200, Avi Kivity wrote:
> On 09/28/2012 05:35 AM, Paul E. McKenney wrote:
> > On Thu, Sep 27, 2012 at 12:40:44PM +0800, Fengguang Wu wrote:
> >> On Wed, Sep 26, 2012 at 09:28:50PM -0700, Paul E. McKenney wrote:
> >> > On Thu, Sep 27, 2012 at 10:54:00AM +0800, Fengguang Wu wrote:
> >> > > On Wed, Sep 26, 2012 at 09:45:43AM -0700, Paul E. McKenney wrote:
> >> > > > On Wed, Sep 26, 2012 at 04:15:01PM +0800, Fengguang Wu wrote:
> > 
> > [ . . . ]
> > 
> >> > > > But could you also please send your .config file and a description of
> >> > > 
> >> > > .config attached.
> >> > > 
> >> > > > the workload you are running?
> >> > > 
> >> > > It's basically the below commands. The exact initrd is not relevant in
> >> > > this case because it's a boot time warning before user space is
> >> > > started. The stalls roughly happen 1 time on every 10 boots.
> >> > 
> >> > Yow!!!
> >> > 
> >> > You have severe cross-CPU time-synchronization problems.  See for
> >> > example the first dmesg, with the relevant part extracted right here.
> >> > One CPU believes that it is about 37 seconds past boot, and the other
> >> > CPU beleives that it is about 137 seconds past boot.  Given that large
> >> > of a time difference, an RCU CPU stall warning is expected behavior.
> >> 
> >> Good spot! Yeah I noticed that huge timestamp gap, however didn't take
> >> it seriously enough..
> >> 
> >> > Get your two CPUs in agreement about what time it is, and I bet that
> >> > the CPU stall warnings will go away.
> >> 
> >> Possibly KVM related? Because the warnings show up in many test boxes
> >> running KVM and so is not likely some hardware specific issue.
> > 
> > I vaguely recall seeing something recently.  But let's ask the KVM and
> > timekeeping guys.
> 
> >From the logs it looks like hpet (why not kvmclock?) is used for the
> clock, it should not generate such drifts since it is a global clock.
> Can you verify current_clocksource on a boot that actually failed (in
> case the clocksource is switched during runtime)?

I've checked out the dmesg that's cited by Paul, attached. Yes it
contains lines

[    4.970051] Switching to clocksource hpet

and then

[    7.250353] Switching to clocksource tsc

And there is no kvm-clock lines. Oh well for this particular kernel:

# CONFIG_KVM_CLOCK is not set

I'm not sure how this happen, maybe some kconfig that CONFIG_KVM_CLOCK
depends on is randconfig'ed to off..

Thanks,
Fengguang

View attachment "dmesg-kvm_bisect2-inn-42527-2012-09-27-10-38-38-3.6.0-rc7-bisect2-00078-g593d100-21" of type "text/plain" (307119 bytes)

View attachment ".config" of type "text/plain" (69161 bytes)

Powered by blists - more mailing lists