[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.21.1806131140560.2280@nanos.tec.linutronix.de>
Date: Wed, 13 Jun 2018 11:48:09 +0200 (CEST)
From: Thomas Gleixner <tglx@...utronix.de>
To: Ricardo Neri <ricardo.neri-calderon@...ux.intel.com>
cc: Ingo Molnar <mingo@...nel.org>, "H. Peter Anvin" <hpa@...or.com>,
Andi Kleen <andi.kleen@...el.com>,
Ashok Raj <ashok.raj@...el.com>, Borislav Petkov <bp@...e.de>,
Tony Luck <tony.luck@...el.com>,
"Ravi V. Shankar" <ravi.v.shankar@...el.com>, x86@...nel.org,
sparclinux@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org,
linux-kernel@...r.kernel.org, Jacob Pan <jacob.jun.pan@...el.com>,
"Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
Don Zickus <dzickus@...hat.com>,
Nicholas Piggin <npiggin@...il.com>,
Michael Ellerman <mpe@...erman.id.au>,
Frederic Weisbecker <frederic@...nel.org>,
Alexei Starovoitov <ast@...nel.org>,
Babu Moger <babu.moger@...cle.com>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Masami Hiramatsu <mhiramat@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Philippe Ombredanne <pombredanne@...b.com>,
Colin Ian King <colin.king@...onical.com>,
Byungchul Park <byungchul.park@....com>,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
"Luis R. Rodriguez" <mcgrof@...nel.org>,
Waiman Long <longman@...hat.com>,
Josh Poimboeuf <jpoimboe@...hat.com>,
Randy Dunlap <rdunlap@...radead.org>,
Davidlohr Bueso <dave@...olabs.net>,
Christoffer Dall <cdall@...aro.org>,
Marc Zyngier <marc.zyngier@....com>,
Kai-Heng Feng <kai.heng.feng@...onical.com>,
Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
David Rientjes <rientjes@...gle.com>,
iommu@...ts.linux-foundation.org
Subject: Re: [RFC PATCH 20/23] watchdog/hardlockup/hpet: Rotate interrupt
among all monitored CPUs
On Tue, 12 Jun 2018, Ricardo Neri wrote:
> + /* There are no CPUs to monitor. */
> + if (!cpumask_weight(&hdata->monitored_mask))
> + return NMI_HANDLED;
> +
> inspect_for_hardlockups(regs);
>
> + /*
> + * Target a new CPU. Keep trying until we find a monitored CPU. CPUs
> + * are addded and removed to this mask at cpu_up() and cpu_down(),
> + * respectively. Thus, the interrupt should be able to be moved to
> + * the next monitored CPU.
> + */
> + spin_lock(&hld_data->lock);
Yuck. Taking a spinlock from NMI ...
> + for_each_cpu_wrap(cpu, &hdata->monitored_mask, smp_processor_id() + 1) {
> + if (!irq_set_affinity(hld_data->irq, cpumask_of(cpu)))
> + break;
... and then calling into generic interrupt code which will take even more
locks is completely broken.
Guess what happens when the NMI hits a section where one of those locks is
held? Then you need another watchdog to decode the lockup you just ran into.
Thanks,
tglx
Powered by blists - more mailing lists