linux-kernel - Re: [PATCH v5 3/3] Add BUG_XX() debugging hard/soft lockup detection

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160203201453.GV26637@redhat.com>
Date:	Wed, 3 Feb 2016 15:14:53 -0500
From:	Don Zickus <dzickus@...hat.com>
To:	Jeffrey Merkey <jeffmerkey@...il.com>
Cc:	linux-kernel@...r.kernel.org, akpm@...ux-foundation.org,
	atomlin@...hat.com, cmetcalf@...hip.com, fweisbec@...il.com,
	hidehiro.kawai.ez@...achi.com, mhocko@...e.cz, tj@...nel.org,
	uobergfe@...hat.com
Subject: Re: [PATCH v5 3/3] Add BUG_XX() debugging hard/soft lockup detection

On Wed, Feb 03, 2016 at 10:23:42AM -0700, Jeffrey Merkey wrote:
> > Hmm, I am confused here.  So you are saying because we are in the nmi
> > handler you can not break into the system?  The nmi handler prints some
> > stuff to the screen, pokes the other cpus to print stuff to the screen and
> > then returns to a normal operation.  Unless you are saying the act of
> > sending NMI IPIs never completes (because a cpu is blocking IPI
> > interrupts),
> > so the cpu hangs in nmi context and the debugger never has a chance to
> > 'break' in and see what is going on?
> >
> > Cheers,
> > Don
> >
> 
> Yes.  the nmi handlers never complete for the bug I worked on with
> tglx, probably because an nmi handler is calling timekeeper.c
> somewhere.  Some of these lockup bugs may be calling code from the nmi
> handlers that cause the lockup condition in the first place in some
> cases, so it will never reach a call to panic.  Looking over this code
> it's damn hard to find a good way to do this that works across all the
> arches without adding another macro to bug.h (BREAK_ON maybe), so I
> just used one that's already there.  I'll go back and rethink this
> some more.  It could just be as simple as calling panic from the first
> detection -- that works.

So, if you disable 'sysctl_hardlockup_all_cpu_backtrace' and enable
'hardlockup_panic', you should be able to achieve what you want, no?

But you mentioned you wanted to recover?  Hence avoiding the panic?

Cheers,
Don