lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Mon, 14 Dec 2015 20:18:56 -0700
From:	Jeff Merkey <linux.mdb@...il.com>
To:	Don Zickus <dzickus@...hat.com>
Cc:	LKML <linux-kernel@...r.kernel.org>, akpm@...ux-foundation.org,
	uobergfe@...hat.com, atomlin@...hat.com, cmetcalf@...hip.com,
	fweisbec@...il.com
Subject: Re: [PATCH 1/1] Fix HARD Lockup Firing off while in debugger

On 12/14/15, Jeff Merkey <linux.mdb@...il.com> wrote:
> On 12/14/15, Don Zickus <dzickus@...hat.com> wrote:
>> On Sat, Dec 12, 2015 at 02:08:13PM -0700, Jeff Merkey wrote:
>>> The current touch_nmi_watchdog() function in /kernel/watchdog.c does
>>> not always catch all cases when a processor is spinning in the nmi
>>> handler inside either KGDB, KDB, or MDB.  The hrtimer_interrupts_saved
>>> count can still end up matching the previous value in some cases,
>>> resulting in the hard lockup detector tagging processors inside a
>>
>> Hi Jeff,
>>
>> I am confused here, the 'touch_nmi_watchdog()' was supposed to block the
>> check for hrtimer_interrupts from happening.  So if the check is still
>> being
>> executed _after_ you executed touch_nmi_watchdog(), it would imply there
>> was
>> about 10 seconds or so of time elapse from the touch command to the
>> hrtimer
>> check.
>>
>> So I am not sure how the below patch would fix this, other than just add
>> another 10 second delay (for a total of 20 seconds) to your timeout?
>>
>>
>>> debugger and executing a panic.  The patch below corrects this
>>> problem.  I did not add this to the touch_nmi_function directly
>>> becuase of possible affects on timing issues.
>>>
>>> I have tested this patch and it fixes the problem for kernel debuggers
>>> stopping errant hard lockup events when processors are spinning inside
>>> the debugger.
>>
>> The kernel doesn't normal take patches like this without a corresponding
>> user, which I didn't see attached in this patch or a patch series.
>>
>> Cheers,
>> Don
>>
>
> I'll resend the patch series properly formatted and clean.   There is
> a hole in there somewhere that causes this bug.  You can reproduce it
> by downloading the mdb debugger, patching linux, building it, then
> removing the call to this function while spinning in the debugger with
> a  breakpoint on schedule() set from the debugger console.  It does
> fire off in about 20 seconds without this function I have suggested.
>
> You can download the debugger here.
>
> https://github.com/jeffmerkey/linux-stable/compare/v4.3.2...jeffmerkey:mdb-v4.3.2.diff
>
> Use this patch applied to kernel v4.3.2 if you want to easily
> reproduce it and before you build it remove the function call to
> touch_hardlockup_watchdog() at mdb_watchdogs() in
> arch/x86/kernel/debug/mdb/mdb-main.c.
>
> I'll format another patch this time a clean one.  I apologize.
>
> Jeff
>

Oh, and don't forget to type "g" for go after setting the schedule()
breakpoint.  This will reload all the processors and cause them to
break into the debugger and be held by the debugger at int1 exception.
This is when the touch_nmi_watchdog() breaks.

You also need to do this on an SMP system,  It's an SMP bug,
preferablt one with 4 or more processors.

Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ