lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAO6TR8WKEsfd7AtFD0WxRwsR6R0VL9riTeP3zHYjJw8ZD85x0w@mail.gmail.com>
Date:	Wed, 16 Dec 2015 09:57:18 -0700
From:	Jeff Merkey <linux.mdb@...il.com>
To:	linux-kernel@...r.kernel.org
Cc:	akpm@...ux-foundation.org, uobergfe@...hat.com, dzickus@...hat.com,
	atomlin@...hat.com, cmetcalf@...hip.com, fweisbec@...il.com
Subject: Re: [PATCH] Fix spurious hard lockup events while in debugger

On 12/14/15, Jeff Merkey <linux.mdb@...il.com> wrote:
> The current touch_nmi_watchdog() function in /kernel/watchdog.c does
> not always catch all cases when a processor is spinning in the nmi
> handler inside either KGDB, KDB, or MDB, in particular, the case where
> a processor is being held by a debugger inside an int1 handler.
>
> The hrtimer_interrupts_saved count can still end up matching the
> hrtime value in some cases, resulting in the hard lockup detector
> tagging processors inside a debugger and executing a panic.
>
> The patch below corrects this problem.  I did not add this to
> the touch_nmi_function directly becuase of possible affects on
> timing issues since the function is widely used by drivers and
> modules.
>
> I have tested this patch and it fixes the problem for kernel debuggers
> stopping errant hard lockup events when processors are spinning inside
> the debugger.
>
> Signed-off-by: Jeff Merkey <linux.mdb@...il.com>
> ---
>  kernel/watchdog.c | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 18f34cf..b682aab 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -283,6 +283,13 @@ static bool is_hardlockup(void)
>  	__this_cpu_write(hrtimer_interrupts_saved, hrint);
>  	return false;
>  }
> +
> +void touch_hardlockup_watchdog(void)
> +{
> +	__this_cpu_write(hrtimer_interrupts_saved, 0);
> +}
> +EXPORT_SYMBOL_GPL(touch_hardlockup_watchdog);
> +
>  #endif
>
>  static int is_softlockup(unsigned long touch_ts)
> --
> 1.8.3.1
>
>


I stared at the function that detects hardlockups until my eyes have
turned red and the code looks ok.  I am still trying to figure why
this is happening. I call touch_nmi_watchdog() without the other
function and the hard lockup fires off.

I'll try to debug it and determine why it is happening.  Having the
flag thing for the touch operating independent of the counter may be
the clue.  I'll look into it a little more today and see if I can
figure out why it is happening.

Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ