lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sun, 30 Sep 2012 19:18:12 +0800
From:	Fengguang Wu <fengguang.wu@...el.com>
To:	Avi Kivity <avi@...hat.com>
Cc:	paulmck@...ux.vnet.ibm.com, Josh Boyer <jwboyer@...hat.com>,
	Christian Hoffmann <email@...istianhoffmann.info>,
	LKML <linux-kernel@...r.kernel.org>,
	"kvm@...r.kernel.org" <kvm@...r.kernel.org>, johnstul@...ibm.com,
	tglx@...utronix.de
Subject: Re: INFO: rcu_preempt detected stalls on CPUs/tasks: { 1} (detected
 by 0, t=10002 jiffies)

On Sun, Sep 30, 2012 at 01:10:55PM +0200, Avi Kivity wrote:
> On 09/28/2012 05:35 AM, Paul E. McKenney wrote:
> > On Thu, Sep 27, 2012 at 12:40:44PM +0800, Fengguang Wu wrote:
> >> On Wed, Sep 26, 2012 at 09:28:50PM -0700, Paul E. McKenney wrote:
> >> > On Thu, Sep 27, 2012 at 10:54:00AM +0800, Fengguang Wu wrote:
> >> > > On Wed, Sep 26, 2012 at 09:45:43AM -0700, Paul E. McKenney wrote:
> >> > > > On Wed, Sep 26, 2012 at 04:15:01PM +0800, Fengguang Wu wrote:
> > 
> > [ . . . ]
> > 
> >> > > > But could you also please send your .config file and a description of
> >> > > 
> >> > > .config attached.
> >> > > 
> >> > > > the workload you are running?
> >> > > 
> >> > > It's basically the below commands. The exact initrd is not relevant in
> >> > > this case because it's a boot time warning before user space is
> >> > > started. The stalls roughly happen 1 time on every 10 boots.
> >> > 
> >> > Yow!!!
> >> > 
> >> > You have severe cross-CPU time-synchronization problems.  See for
> >> > example the first dmesg, with the relevant part extracted right here.
> >> > One CPU believes that it is about 37 seconds past boot, and the other
> >> > CPU beleives that it is about 137 seconds past boot.  Given that large
> >> > of a time difference, an RCU CPU stall warning is expected behavior.
> >> 
> >> Good spot! Yeah I noticed that huge timestamp gap, however didn't take
> >> it seriously enough..
> >> 
> >> > Get your two CPUs in agreement about what time it is, and I bet that
> >> > the CPU stall warnings will go away.
> >> 
> >> Possibly KVM related? Because the warnings show up in many test boxes
> >> running KVM and so is not likely some hardware specific issue.
> > 
> > I vaguely recall seeing something recently.  But let's ask the KVM and
> > timekeeping guys.
> 
> >From the logs it looks like hpet (why not kvmclock?) is used for the

Hi Avi! Thanks for looking into this. It seems you have the full logs
attached in my previous email?

FYI, I've enabled CONFIG_KVM_CLOCK/CONFIG_KVM_GUEST for all bootable
kernels and here is the related boot message:

[    0.000000] kvm-clock: Using msrs 4b564d01 and 4b564d00
[    0.000000] kvm-clock: cpu 0, msr 0:1b7ec81, boot clock

> clock, it should not generate such drifts since it is a global clock.
> Can you verify current_clocksource on a boot that actually failed (in
> case the clocksource is switched during runtime)?

I see a line

[    2.011710] Switching to clocksource kvm-clock

w/o any indication of errors.

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists