[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+ekxPVubVwkGaJczxyTwSEk4Y3TbU6hTf0Rut8cJggf9-R9+w@mail.gmail.com>
Date: Tue, 2 Feb 2016 21:17:22 -0700
From: Jeffrey Merkey <jeffmerkey@...il.com>
To: Don Zickus <dzickus@...hat.com>
Cc: linux-kernel@...r.kernel.org, akpm@...ux-foundation.org,
atomlin@...hat.com, cmetcalf@...hip.com, fweisbec@...il.com,
hidehiro.kawai.ez@...achi.com, mhocko@...e.cz, tj@...nel.org,
uobergfe@...hat.com
Subject: Re: [PATCH v5 3/3] Add BUG_XX() debugging hard/soft lockup detection
> Because when you catch a bug in the hard lockup detector the system
> just sits there hard hung and you are not able to get into a debugger
> console since the system has crashed and the watchdog code has already
> killed off the other processors and locked up all the NMI interrupt
> handlers, thereby preventing any debugger at all from functioning
> other than a hardware ice, so it's a hell of a lot easier just to
> trigger a break when you detect the first instance of a hard lockup
> before the system is completely hosed.
>
So this is why Ingo and tglx's suggestion doesn't work. Unless you
can set a breakpoint in the detector coede, once the lockup occurs
about 50% of the time (when the IF flag is not set and interrupts are
disabled), you can't get into a debugger because the system is hosed.
The way the current hard lockup detector works is a lot like the death
star self-destruct system for linux -- it detects one, tries to IPI
the other processors to dump their stacks, then somewhere down in the
OS all of it locks up -- once and a while I can get it too panic. A
great bug to test your detector with is the one in timekeeper.c tglx
and I worked on. Good luck getting into any debugger when it fires
off. I like the fact this code does not call panic and is somewhat
dynamic allowing recovery of the system, but it takes a healthy system
with a single bug, burns it to the ground, locks up all the
processors, and prevents the debugger from being entered unless a
breakpoint has been set.
Perhaps this helps you understand.
Jeff
Powered by blists - more mailing lists