lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 7 Jul 2016 16:06:33 -0700
From:	Joel Fernandes <agnel.joel@...il.com>
To:	Thomas Gleixner <tglx@...utronix.de>,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	Corey Minyard <cminyard@...sta.com>,
	Frédéric Weisbecker <fweisbec@...il.com>
Cc:	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: High rate of touch_softlockup makes Soft Lockup detector useless

Hi Thomas,

Thanks a lot for your reply.

On Thu, Jul 7, 2016 at 8:17 AM, Thomas Gleixner <tglx@...utronix.de> wrote:
> On Wed, 6 Jul 2016, Joel Fernandes wrote:
>> In a system running a recent kernel, I am trying to use soft lockup
>> detector to detect soft lockups in the system.
>> During this exercise, I see that even with real soft lockups, the
>> kernel is unable to detect them.
>
> What is your definition of a real soft lockup?

In my test, I am doing the following in process context:

preempt_disable();
while(1);

>> Digging in further, I found that the softlockup watchdog is touched
>> 1000s of times per second by the NOHZ code.
>> prints revealed the following 2 functions calling touch_softlockup_watchdog:
>> [  165.960292] CPU0 touch: tick_nohz_restart_sched_tick
>> [  165.960309] CPU1 touch: tick_nohz_update_jiffies
>>
>> I am wondering, do we really need to touch the softlockup watchdog
>> from the tick_nohz code?
>> From the code comments it looks like the watchdog is touch'ed because
>> the tick was off and was being turned on so it could the watchdog may
>> not have been touched for a long time.
>> BUT, wouldn't the hrtimer interrupt for the watchdog timer cause the
>> watchdog thread to be scheduled even though the tick was off for a
>> long time? Then in that case do we really need to touch the softlockup
>> watchdog from the tick_nohz code?
>
> Yes, it will be scheduled, but it might be too late. Assume the following:
>
> t1           hrtimer fires
>              watchdog thread runs
>              watchdog timer is rearmed to t2 = t1 + period
>
>              idle sleep
>
> t2 - 1ms     long running thread gets scheduled
>
> t2           hrtimer fires
>
>              long running thread stops
>
>              watchdog thread runs and detects soft lockup
>
> The soft lockup detector checks whether the CPU is hogged by some random
> task. It does so by monitoring whether the watchdog task which is peridocially
> scheduled by a hrtimer becomes running before the watchdog period elapses.
>
> If the cpu goes idle then nothing hogs the cpu and the check period can be
> canceled.

That makes sense, thanks for explaining. I found out my problem was
because of occasional serial console prints resetting the watchdogs.
In drivers/tty/serial/8250/8250_port.c touch_nmi_watchdog() is being
called. Disabling serial console makes the softlockup and hardlockup
detectors work again for me.

I am thinking if we can avoid touching the watchdog from 8250 port
driver (8250_port.c)? The nmi/soft watchdogs is touched in this file
in 2 functions:

(1)  serial8250_console_write:
This calls touch_nmi_watchdog and I haven't seen this function take
more than 6ms on my system to write a string so it seems little
overkill to me to call touch_nmi_watchdog here (?)

(2) wait_for_xmitr:
This calls touch_nmi_watchdog while busylooping for upto 1s unless
UART_MSR_CTS is set. The lockup detection timeout is 10 seconds for me
so resetting the nmi watchdog just because we may wait for upto 1s
(likely to be much less) seems unnecessary.

In either case does it make sense to avoid doing touch_nmi_watchdog
here? The trouble is this also touches the softlockup watchdog so it
also messes up the softlockup detection for me.
Just adding Greg and Corey to the thread as well to see if they have
any thoughts on if we can avoid touching nmi watchdog from 8250 tty
driver.

Thanks,
Joel

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ