linux-kernel - RE: [PATCH V2] kernel/watchdog: fix spurious hard lockups

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Mon, 17 Jul 2017 14:46:53 +0000
From:   "Liang, Kan" <kan.liang@...el.com>
To:     Thomas Gleixner <tglx@...utronix.de>
CC:     Don Zickus <dzickus@...hat.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "mingo@...nel.org" <mingo@...nel.org>,
        "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
        "babu.moger@...cle.com" <babu.moger@...cle.com>,
        "atomlin@...hat.com" <atomlin@...hat.com>,
        "prarit@...hat.com" <prarit@...hat.com>,
        "torvalds@...ux-foundation.org" <torvalds@...ux-foundation.org>,
        "peterz@...radead.org" <peterz@...radead.org>,
        "eranian@...gle.com" <eranian@...gle.com>,
        "acme@...hat.com" <acme@...hat.com>,
        "ak@...ux.intel.com" <ak@...ux.intel.com>,
        "stable@...r.kernel.org" <stable@...r.kernel.org>
Subject: RE: [PATCH V2] kernel/watchdog: fix spurious hard lockups



> On Mon, 17 Jul 2017, Liang, Kan wrote:
> > > That doesn't make sense. What's the exact test procedure?
> >
> > I don't know the exact test procedure. The test case is from our customer.
> > I only know that the test case makes calls into the x11 libs.
> 
> Sigh. This starts to be silly. You test something and have no idea what it does?

As I said, the test case is from our customer. They only share binaries with us.
Actually, it's more proper to call it test suite. It includes dozens of small test.
I just reproduced the issue and verified all the three patches in our lab.
Then I report it here as request immediately.
So I know little about the test case for now. 
I will share more when I learn more.
Sorry for that.

> 
> > > > According to our test, only patch 3 works well.
> > > > The other two patches will hang the system eventually.
> 
> Hang the system eventually? Does that mean that the system stops working
> and the watchdog does not catch the problem?


Right, the system stops working and the watchdog does not catch the problem.

> 
> > > > BTW: We set 1 to watchdog_thresh when we did the test.
> > > > It's believed that can speed up the failure.
> > >
> > > Believe is not really a technical measure....
> > >
> >
> > 1 is a valid value for watchdog_thresh.
> > It was set through the standard proc interface.
> > /proc/sys/kernel/watchdog_thresh
> > It should not impacts the final test result.
> 
> I know that 1 is a valid value and I know how that can be set. Still, it does not
> help if you believe that setting the threshold to 1 can speed up the failure.
> Either you know it for sure or not. You can believe in god or whatever, but
> here we talk about facts.

I personally didn't compare the difference between 1 and default 10 for this
test case.
Before we had the test case from customer, we developed other micro
which can reproduce the similar issue.
For that micro, 1 can speed up the failure.
(BTW: all the three patches can fix the issue which was reproduced by that micro.)

If you think it's meaningful to verify 10 as well, I can do the compare.

Thanks,
Kan