[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <37D7C6CF3E00A74B8858931C1DB2F0775371D9AE@SHSMSX103.ccr.corp.intel.com>
Date: Mon, 17 Jul 2017 14:46:53 +0000
From: "Liang, Kan" <kan.liang@...el.com>
To: Thomas Gleixner <tglx@...utronix.de>
CC: Don Zickus <dzickus@...hat.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"mingo@...nel.org" <mingo@...nel.org>,
"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
"babu.moger@...cle.com" <babu.moger@...cle.com>,
"atomlin@...hat.com" <atomlin@...hat.com>,
"prarit@...hat.com" <prarit@...hat.com>,
"torvalds@...ux-foundation.org" <torvalds@...ux-foundation.org>,
"peterz@...radead.org" <peterz@...radead.org>,
"eranian@...gle.com" <eranian@...gle.com>,
"acme@...hat.com" <acme@...hat.com>,
"ak@...ux.intel.com" <ak@...ux.intel.com>,
"stable@...r.kernel.org" <stable@...r.kernel.org>
Subject: RE: [PATCH V2] kernel/watchdog: fix spurious hard lockups
> On Mon, 17 Jul 2017, Liang, Kan wrote:
> > > That doesn't make sense. What's the exact test procedure?
> >
> > I don't know the exact test procedure. The test case is from our customer.
> > I only know that the test case makes calls into the x11 libs.
>
> Sigh. This starts to be silly. You test something and have no idea what it does?
As I said, the test case is from our customer. They only share binaries with us.
Actually, it's more proper to call it test suite. It includes dozens of small test.
I just reproduced the issue and verified all the three patches in our lab.
Then I report it here as request immediately.
So I know little about the test case for now.
I will share more when I learn more.
Sorry for that.
>
> > > > According to our test, only patch 3 works well.
> > > > The other two patches will hang the system eventually.
>
> Hang the system eventually? Does that mean that the system stops working
> and the watchdog does not catch the problem?
Right, the system stops working and the watchdog does not catch the problem.
>
> > > > BTW: We set 1 to watchdog_thresh when we did the test.
> > > > It's believed that can speed up the failure.
> > >
> > > Believe is not really a technical measure....
> > >
> >
> > 1 is a valid value for watchdog_thresh.
> > It was set through the standard proc interface.
> > /proc/sys/kernel/watchdog_thresh
> > It should not impacts the final test result.
>
> I know that 1 is a valid value and I know how that can be set. Still, it does not
> help if you believe that setting the threshold to 1 can speed up the failure.
> Either you know it for sure or not. You can believe in god or whatever, but
> here we talk about facts.
I personally didn't compare the difference between 1 and default 10 for this
test case.
Before we had the test case from customer, we developed other micro
which can reproduce the similar issue.
For that micro, 1 can speed up the failure.
(BTW: all the three patches can fix the issue which was reproduced by that micro.)
If you think it's meaningful to verify 10 as well, I can do the compare.
Thanks,
Kan
Powered by blists - more mailing lists