linux-kernel - RE: [PATCH V2] kernel/watchdog: fix spurious hard lockups

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.20.1707171655200.2185@nanos>
Date:   Mon, 17 Jul 2017 17:00:40 +0200 (CEST)
From:   Thomas Gleixner <tglx@...utronix.de>
To:     "Liang, Kan" <kan.liang@...el.com>
cc:     Don Zickus <dzickus@...hat.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "mingo@...nel.org" <mingo@...nel.org>,
        "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
        "babu.moger@...cle.com" <babu.moger@...cle.com>,
        "atomlin@...hat.com" <atomlin@...hat.com>,
        "prarit@...hat.com" <prarit@...hat.com>,
        "torvalds@...ux-foundation.org" <torvalds@...ux-foundation.org>,
        "peterz@...radead.org" <peterz@...radead.org>,
        "eranian@...gle.com" <eranian@...gle.com>,
        "acme@...hat.com" <acme@...hat.com>,
        "ak@...ux.intel.com" <ak@...ux.intel.com>,
        "stable@...r.kernel.org" <stable@...r.kernel.org>
Subject: RE: [PATCH V2] kernel/watchdog: fix spurious hard lockups

On Mon, 17 Jul 2017, Liang, Kan wrote:
> > > > > According to our test, only patch 3 works well.
> > > > > The other two patches will hang the system eventually.
> > 
> > Hang the system eventually? Does that mean that the system stops working
> > and the watchdog does not catch the problem?
> 
> Right, the system stops working and the watchdog does not catch the problem.

What exactly means: "stops working" ? Just that you observe that the system
does not make progress or is not reacting to key strokes or what?

And what is the lockup, which is detected in the other case? Which code
path causes the lockup?

> I personally didn't compare the difference between 1 and default 10 for this
> test case.
> Before we had the test case from customer, we developed other micro
> which can reproduce the similar issue.
> For that micro, 1 can speed up the failure.
> (BTW: all the three patches can fix the issue which was reproduced by that micro.)
> 
> If you think it's meaningful to verify 10 as well, I can do the compare.

It might be worth a try, but unless we can either get hands on the test
scenario or at least have a proper explanation of what it is doing
including the expected outcome, i.e. what is the 'system is locked up'
failure which should be detected by the watchdog, I can't tell anything.

Thanks,

	tglx