[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <AANLkTikvx4BbVRAa-97sLJEi0bb=6xdB_04=JCYQDnBA@mail.gmail.com>
Date: Thu, 19 Aug 2010 15:01:55 +0200
From: Stephane Eranian <eranian@...gle.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: LKML <linux-kernel@...r.kernel.org>, mingo@...e.hu,
"David S. Miller" <davem@...emloft.net>,
Paul Mackerras <paulus@...ba.org>,
Frédéric Weisbecker <fweisbec@...il.com>,
eranian@...il.com, perfmon2-devel@...ts.sf.net
Subject: Re: [BUG] perf_events: NMI watchdog event cannot be throttled
On Thu, Aug 19, 2010 at 1:24 PM, Stephane Eranian <eranian@...gle.com> wrote:
> Yeah, that should probably fix it. Let me try it out.
>
Works for me.
Thanks.
>
> On Thu, Aug 19, 2010 at 1:05 PM, Peter Zijlstra <peterz@...radead.org> wrote:
>> On Wed, 2010-08-18 at 22:26 +0200, Stephane Eranian wrote:
>>> Hi,
>>>
>>> I ran into some issue with the NMI watchdog not firing in a deadlock
>>> situation. After some debugging I found the source of the problem.
>>>
>>> The NMI watchdog is currently subject, like any other events, to interrupt
>>> throttling. The heart of the problem is that if you are deadlocked on a CPU
>>> with interrupts masked, the timer interrupt won't fire, therefore the
>>> hwc->interrupts
>>> field won't be reset. Then, depending on the max sampling rate, you
>>> could eventually
>>> fail the max interrupt rate test in __pfm_overflow_handler() and
>>> perf_events would
>>> throttle, i.e., stop, the NMI watchdog event before the 5s delay to panic.
>>> Thus, you would never get the panic. I ran into this problem myself.
>>>
>>> This is a serious issue because perf_events must ensure the watchdog can
>>> always fire, regardless of the interrupt masking situation.
>>>
>>> Look like one way of solving the problem would be to mark the NMI watchdog
>>> event as immune to throttling. The event being internal to the kernel we could
>>> trust the event setup from perf_event_create_kernel_counter().
>>
>> Something like so?
>>
>> ---
>> kernel/watchdog.c | 3 +++
>> 1 files changed, 3 insertions(+), 0 deletions(-)
>>
>> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
>> index 613bc1f..e0fe6e4 100644
>> --- a/kernel/watchdog.c
>> +++ b/kernel/watchdog.c
>> @@ -206,6 +206,9 @@ void watchdog_overflow_callback(struct perf_event *event, int nmi,
>> struct perf_sample_data *data,
>> struct pt_regs *regs)
>> {
>> + /* Ensure the watchdog never gets throttled. */
>> + event->hw.interrupts = 0;
>> +
>> if (__get_cpu_var(watchdog_nmi_touch) == true) {
>> __get_cpu_var(watchdog_nmi_touch) = false;
>> return;
>>
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists