linux-kernel - Re: [RFC PATCH 17/23] watchdog/hardlockup/hpet: Convert the timer's interrupt to NMI

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <c040d4e7-f33b-9c93-6f5a-6bf960943e36@infradead.org>
Date:   Tue, 19 Jun 2018 17:25:09 -0700
From:   Randy Dunlap <rdunlap@...radead.org>
To:     Ricardo Neri <ricardo.neri-calderon@...ux.intel.com>,
        Thomas Gleixner <tglx@...utronix.de>
Cc:     Ingo Molnar <mingo@...nel.org>, "H. Peter Anvin" <hpa@...or.com>,
        Andi Kleen <andi.kleen@...el.com>,
        Ashok Raj <ashok.raj@...el.com>, Borislav Petkov <bp@...e.de>,
        Tony Luck <tony.luck@...el.com>,
        "Ravi V. Shankar" <ravi.v.shankar@...el.com>, x86@...nel.org,
        sparclinux@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org,
        linux-kernel@...r.kernel.org, Jacob Pan <jacob.jun.pan@...el.com>,
        "Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
        Don Zickus <dzickus@...hat.com>,
        Nicholas Piggin <npiggin@...il.com>,
        Michael Ellerman <mpe@...erman.id.au>,
        Frederic Weisbecker <frederic@...nel.org>,
        Alexei Starovoitov <ast@...nel.org>,
        Babu Moger <babu.moger@...cle.com>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Masami Hiramatsu <mhiramat@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Philippe Ombredanne <pombredanne@...b.com>,
        Colin Ian King <colin.king@...onical.com>,
        Byungchul Park <byungchul.park@....com>,
        "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
        "Luis R. Rodriguez" <mcgrof@...nel.org>,
        Waiman Long <longman@...hat.com>,
        Josh Poimboeuf <jpoimboe@...hat.com>,
        Davidlohr Bueso <dave@...olabs.net>,
        Christoffer Dall <cdall@...aro.org>,
        Marc Zyngier <marc.zyngier@....com>,
        Kai-Heng Feng <kai.heng.feng@...onical.com>,
        Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
        David Rientjes <rientjes@...gle.com>,
        iommu@...ts.linux-foundation.org
Subject: Re: [RFC PATCH 17/23] watchdog/hardlockup/hpet: Convert the timer's
 interrupt to NMI

On 06/19/2018 05:15 PM, Ricardo Neri wrote:
> On Sat, Jun 16, 2018 at 03:24:49PM +0200, Thomas Gleixner wrote:
>> On Fri, 15 Jun 2018, Ricardo Neri wrote:
>>> On Fri, Jun 15, 2018 at 11:19:09AM +0200, Thomas Gleixner wrote:
>>>> On Thu, 14 Jun 2018, Ricardo Neri wrote:
>>>>> Alternatively, there could be a counter that skips reading the HPET status
>>>>> register (and the detection of hardlockups) for every X NMIs. This would
>>>>> reduce the overall frequency of HPET register reads.
>>>>
>>>> Great plan. So if the watchdog is the only NMI (because perf is off) then
>>>> you delay the watchdog detection by that count.
>>>
>>> OK. This was a bad idea. Then, is it acceptable to have an read to an HPET
>>> register per NMI just to check in the status register if the HPET timer
>>> caused the NMI?
>>
>> The status register is useless in case of MSI. MSI is edge triggered ....
>>
>> The only register which gives you proper information is the counter
>> register itself. That adds an massive overhead to each NMI, because the
>> counter register access is synchronized to the HPET clock with hardware
>> magic. Plus on larger systems, the HPET access is cross node and even
>> slower.
> 
> It starts to sound that the HPET is too slow to drive the hardlockup detector.
> 
> Would it be possible to envision a variant of this implementation? In this
> variant, the HPET only targets a single CPU. The actual hardlockup detector
> is implemented by this single CPU sending interprocessor interrupts to the
> rest of the CPUs.
> 
> In this manner only one CPU has to deal with the slowness of the HPET; the
> rest of the CPUs don't have to read or write any HPET registers. A sysfs
> entry could be added to configure which CPU will have to deal with the HPET
> timer. However, profiling could not be done accurately on such CPU.

Please forgive my simple question:

What happens when this one CPU is the one that locks up?

thnx,
-- 
~Randy