lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.20.1707171655200.2185@nanos>
Date:   Mon, 17 Jul 2017 17:00:40 +0200 (CEST)
From:   Thomas Gleixner <tglx@...utronix.de>
To:     "Liang, Kan" <kan.liang@...el.com>
cc:     Don Zickus <dzickus@...hat.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "mingo@...nel.org" <mingo@...nel.org>,
        "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
        "babu.moger@...cle.com" <babu.moger@...cle.com>,
        "atomlin@...hat.com" <atomlin@...hat.com>,
        "prarit@...hat.com" <prarit@...hat.com>,
        "torvalds@...ux-foundation.org" <torvalds@...ux-foundation.org>,
        "peterz@...radead.org" <peterz@...radead.org>,
        "eranian@...gle.com" <eranian@...gle.com>,
        "acme@...hat.com" <acme@...hat.com>,
        "ak@...ux.intel.com" <ak@...ux.intel.com>,
        "stable@...r.kernel.org" <stable@...r.kernel.org>
Subject: RE: [PATCH V2] kernel/watchdog: fix spurious hard lockups

On Mon, 17 Jul 2017, Liang, Kan wrote:
> > > > > According to our test, only patch 3 works well.
> > > > > The other two patches will hang the system eventually.
> > 
> > Hang the system eventually? Does that mean that the system stops working
> > and the watchdog does not catch the problem?
> 
> Right, the system stops working and the watchdog does not catch the problem.

What exactly means: "stops working" ? Just that you observe that the system
does not make progress or is not reacting to key strokes or what?

And what is the lockup, which is detected in the other case? Which code
path causes the lockup?

> I personally didn't compare the difference between 1 and default 10 for this
> test case.
> Before we had the test case from customer, we developed other micro
> which can reproduce the similar issue.
> For that micro, 1 can speed up the failure.
> (BTW: all the three patches can fix the issue which was reproduced by that micro.)
> 
> If you think it's meaningful to verify 10 as well, I can do the compare.

It might be worth a try, but unless we can either get hands on the test
scenario or at least have a proper explanation of what it is doing
including the expected outcome, i.e. what is the 'system is locked up'
failure which should be detected by the watchdog, I can't tell anything.

Thanks,

	tglx

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ