linux-kernel - Re: frequent lockups in 3.18rc4

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20141119150333.GB2953@redhat.com>
Date:	Wed, 19 Nov 2014 10:03:33 -0500
From:	Vivek Goyal <vgoyal@...hat.com>
To:	Don Zickus <dzickus@...hat.com>
Cc:	Dave Jones <davej@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Linux Kernel <linux-kernel@...r.kernel.org>,
	the arch/x86 maintainers <x86@...nel.org>
Subject: Re: frequent lockups in 3.18rc4

On Wed, Nov 19, 2014 at 09:41:05AM -0500, Don Zickus wrote:
> On Tue, Nov 18, 2014 at 05:02:54PM -0500, Dave Jones wrote:
> > On Tue, Nov 18, 2014 at 04:55:40PM -0500, Don Zickus wrote:
> > 
> >  > > So here we mangle CPU3 in and lose the backtrace for cpu0, which might
> >  > > be the real interesting one ....
> >  > 
> >  > Can you provide another dump?  The hope is we get something not mangled?
> > 
> > Working on it..
> > 
> >  > The other option we have done in RHEL is panic the system and let kdump
> >  > capture the memory.  Then we can analyze the vmcore for the stack trace
> >  > cpu0 stored in memory to get a rough idea where it might be if the cpu
> >  > isn't responding very well.
> > 
> > I don't know if it's because of the debug options I typically run with,
> > or that I'm perpetually cursed, but I've never managed to get kdump to
> > do anything useful. (The last time I tried it was actively harmful in
> > that not only did it fail to dump anything, it wedged the machine so
> > it didn't reboot after panic).

Hi Dave Jones,

Not being able to capture the dump I can understand but having wedged
the machine so that it does not reboot after dump failure sounds bad.
So you could not get machine to boot even after a power cycle? Would
you remember what was failing. I am curious to know what did kdump do
to make machine unbootable.

> > 
> > Unless there's some magic step missing from the documentation at
> > http://fedoraproject.org/wiki/How_to_use_kdump_to_debug_kernel_crashes
> > then I'm not optimistic it'll be useful.

I had a quick look at it and it basically looks fine. In fedora ideally
it is just two steps process.

- Reserve memory using crashkernel. Say crashkernel=160M
- systemctl start kdump
- Crash the system or wait for it to crash.

So despite your bad experience in the past, I would encourage you to
give it a try.

> 
> Well, I don't know when the last time you ran it, but I know the RH kexec
> folks have started pursuing a Fedora-first package patch rule a couple of
> years ago to ensure Fedora had a working kexec/kdump solution.

Yep, now we are putting everything in fedora first so it should be much
better. Hard to say the same thing about driver authors. Sometimes they
might have a driver working in rhel and not necessarily upstream. I am
not sure if you ran into one of those issues.

Also recently I have seen issues with graphics drivers too.

> 
> As for the wedging part, it was a common problem to have the kernel hang
> while trying to boot the second kernel (and before console output
> happened).  So the problem makes sense and is unfortunate.  I would
> encourage you to try again.  :-)
> 
> Though, it is transitioning to have the app built into the kernel to deal
> with the whole secure boot thing, so that might be another can of worms.

I doubt that secureboot bits will contribute to the failure.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/