[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20141219205435.GA24499@redhat.com>
Date: Fri, 19 Dec 2014 15:54:35 -0500
From: Dave Jones <davej@...hat.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Chris Mason <clm@...com>,
Mike Galbraith <umgwanakikbuti@...il.com>,
Ingo Molnar <mingo@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Dâniel Fraga <fragabr@...il.com>,
Sasha Levin <sasha.levin@...cle.com>,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Suresh Siddha <sbsiddha@...il.com>,
Oleg Nesterov <oleg@...hat.com>,
Peter Anvin <hpa@...ux.intel.com>
Subject: Re: frequent lockups in 3.18rc4
On Fri, Dec 19, 2014 at 12:46:16PM -0800, Linus Torvalds wrote:
> On Fri, Dec 19, 2014 at 11:51 AM, Linus Torvalds
> <torvalds@...ux-foundation.org> wrote:
> >
> > I do note that we depend on the "new mwait" semantics where we do
> > mwait with interrupts disabled and a non-zero RCX value. Are there
> > possibly even any known CPU errata in that area? Not that it sounds
> > likely, but still..
>
> Remind me what CPU you have in that machine again? The %rax value for
> the mwait cases in question seems to be 0x32, which is either C7s-HSW
> or C7s-BDW, and in both cases has the "TLB flushed" flag set.
>
> I'm pretty sure you have a Haswell, I'm just checking. Which model?
> I'm assuming it's family 6, model 60, stepping 3? I found you
> mentioning i5-4670T in a perf thread.. That the one?
Yep.
vendor_id : GenuineIntel
cpu family : 6
model : 60
model name : Intel(R) Core(TM) i5-4670T CPU @ 2.30GHz
stepping : 3
microcode : 0x1a
> Anyway, I don't actually believe in any CPU bugs, but you could try
> "intel_idle.max_cstate=0" and see if that makes any difference, for
> example.
>
> Or perhaps just "intel_idle.max_cstate=1", which leaves intel_idle
> active, but gets rid of the deeper sleep states (that incidentally
> also play games with leave_mm() etc)
So I'm leaving Red Hat on Tuesday, and can realistically only do one
more experiment over the weekend before I give them this box back.
Right now I'm doing Chris' idea of "turn debugging back on,
and try without serial console". Shall I try your suggestion
on top of that ?
I *hate* for this to be "the one that got away", but we've
at least gotten some good mileage out of this bug in the last
two months. Who knows, maybe I'll find some new hardware that
will exhibit the same behaviour in the new year.
Dave
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists