lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170717144637.34umykrccvjma3fl@redhat.com>
Date:   Mon, 17 Jul 2017 10:46:37 -0400
From:   Don Zickus <dzickus@...hat.com>
To:     "Liang, Kan" <kan.liang@...el.com>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "mingo@...nel.org" <mingo@...nel.org>,
        "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
        "babu.moger@...cle.com" <babu.moger@...cle.com>,
        "atomlin@...hat.com" <atomlin@...hat.com>,
        "prarit@...hat.com" <prarit@...hat.com>,
        "torvalds@...ux-foundation.org" <torvalds@...ux-foundation.org>,
        "peterz@...radead.org" <peterz@...radead.org>,
        "eranian@...gle.com" <eranian@...gle.com>,
        "acme@...hat.com" <acme@...hat.com>,
        "ak@...ux.intel.com" <ak@...ux.intel.com>,
        "stable@...r.kernel.org" <stable@...r.kernel.org>
Subject: Re: [PATCH V2] kernel/watchdog: fix spurious hard lockups

On Mon, Jul 17, 2017 at 01:24:23AM +0000, Liang, Kan wrote:
> Hi Don & Thomas,
> 
> Sorry for the late response. We just finished the tests for all proposed patches.
> 
> There are three proposed patches so far.
> Patch 1: The patch as above which speed up the hrtimer.
> Patch 2: Thomas's first proposal.
> https://patchwork.kernel.org/patch/9803033/
> https://patchwork.kernel.org/patch/9805903/
> Patch 3: my original proposal which increase the NMI watchdog timeout by 3X
> https://patchwork.kernel.org/patch/9802053/
> 
> According to our test, only patch 3 works well.
> The other two patches will hang the system eventually.
> For patch 1, the system hang after running our test case for ~1 hour.
> For patch 2, the system hang in running the overnight test.
> There is no error message shown when the system hang. So I don't know the
> root cause yet.

Hi Kan,

Thanks for the feedback.  Odd that the different patches had different
results.  What is more odd to me is the hang.  I thought these were all
false lockups that prematurely panic'd and rebooted the box.

Is the machine configured to panic on hardlockup and reboot?  Perhaps kdump
is enabled to store the console log for review upon reboot?

It almost implies that a hardlockup did happen but isnt' being detected
until later??
> 
> BTW: We set 1 to watchdog_thresh when we did the test.
> It's believed that can speed up the failure.

Sure, you/they look for 1 second hangs instead of 10 second ones.  But with
patch3 it is more like 3 seconds'ish vs 30 second'ish.

As Thomas asked, I would also be interested in the way the test works.  The
hang doesn't make sense.

Cheers,
Don

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ