[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.20.1709150907300.1890@nanos>
Date: Fri, 15 Sep 2017 09:09:27 +0200 (CEST)
From: Thomas Gleixner <tglx@...utronix.de>
To: Andy Lutomirski <luto@...nel.org>
cc: LKML <linux-kernel@...r.kernel.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
X86 ML <x86@...nel.org>
Subject: Re: BUG: Sporadic crashes with current Linus tree
On Thu, 14 Sep 2017, Andy Lutomirski wrote:
> On Thu, Sep 14, 2017 at 9:00 AM, Thomas Gleixner <tglx@...utronix.de> wrote:
> > On Thu, 14 Sep 2017, Andy Lutomirski wrote:
> >> On Thu, Sep 14, 2017 at 12:38 AM, Thomas Gleixner <tglx@...utronix.de> wrote:
> >> > Hi!
> >> >
> >> > I've seen the following crash sporadically with commit 46c1e79fee:
> >> >
> >> > Have not seen that with 3882a734c19b, though I saw the PCID warnings on
> >> > that machine.
> >> >
> >> > I have no idea how to reproduce so bisecting is pretty much pointless. Any
> >> > idea what to do?
> >>
> >> Does tools/testing/selftests/x86/sigreturn_64 reproduce it?
> >
> > Will try tomorrow once I figured out how to compile that stuff. Invoking a
> > simple make in that directory fails.
>
> What's the error? It works for me.
gcc -m64 -o /home/tglx/work/kernel/linus/linux/tools/testing/selftests/x86/sysret_ss_attrs_64 -O2 -g -std=gnu99 -pthread -Wall sysret_ss_attrs.c thunks.S -lrt -ldl
/usr/bin/ld: /tmp/cco4vSkU.o: relocation R_X86_64_32S against `.text' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: final link failed: Nonrepresentable section on output
collect2: error: ld returned 1 exit status
> >
> > Built it manually and when I run it tells: stack16 is too high
> >
> >> Ugh, weird. It kind of looks like current->thread.sp0 == NULL. I
> >> have a patch series that changes a bunch of that code in my git tree,
> >> but that's definitely not in Linus' tree.
> >
> > Right. The stupid thing is that the machine did not throw up all day
> > neither idle nor loaded. Still the same kernel which barfed tonight several
> > times.
>
> This is weird. The crashing process is rsyslogd, which should have
> been running for a long time and shouldn't have any strange state. I
> wonder if this is some kind of memory corruption. There would have to
> be corruption of thread_struct *and* some kind of issue causing IRET
> to fail, though.
>
> The attached patch could plausibly give some useful hint.
I'll put it on that machine and hope it will reproduce. Didn't die since
yesterday moring ....
Thanks,
tglx
Powered by blists - more mailing lists