[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <AANLkTi=yvDF34e5rYqJt=Av7dhSwBLjXp=r_EfEAMLJY@mail.gmail.com>
Date: Wed, 15 Sep 2010 19:32:49 +0200
From: Stephane Eranian <eranian@...gle.com>
To: Robert Richter <robert.richter@....com>
Cc: Ingo Molnar <mingo@...e.hu>, Peter Zijlstra <peterz@...radead.org>,
Don Zickus <dzickus@...hat.com>,
"gorcunov@...il.com" <gorcunov@...il.com>,
"fweisbec@...il.com" <fweisbec@...il.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"ying.huang@...el.com" <ying.huang@...el.com>,
"ming.m.lin@...el.com" <ming.m.lin@...el.com>,
"yinghai@...nel.org" <yinghai@...nel.org>,
"andi@...stfloor.org" <andi@...stfloor.org>
Subject: Re: [PATCH] perf, x86: catch spurious interrupts after disabling counters
On Wed, Sep 15, 2010 at 7:00 PM, Robert Richter <robert.richter@....com> wrote:
> On 15.09.10 12:36:27, Stephane Eranian wrote:
>> On Wed, Sep 15, 2010 at 6:20 PM, Robert Richter <robert.richter@....com> wrote:
>> > On 14.09.10 19:41:32, Robert Richter wrote:
>> >> I found the reason why we get the unknown nmi. For some reason
>> >> cpuc->active_mask in x86_pmu_handle_irq() is zero. Thus, no counters
>> >> are handled when we get an nmi. It seems there is somewhere a race
>> >> accessing the active_mask. So far I don't have a fix available.
>> >> Changing x86_pmu_stop() did not help:
>> >
>> > The patch below for tip/perf/urgent fixes this.
>> >
>> > -Robert
>> >
>> > From 4206a086f5b37efc1b4d94f1d90b55802b299ca0 Mon Sep 17 00:00:00 2001
>> > From: Robert Richter <robert.richter@....com>
>> > Date: Wed, 15 Sep 2010 16:12:59 +0200
>> > Subject: [PATCH] perf, x86: catch spurious interrupts after disabling counters
>> >
>> > Some cpus still deliver spurious interrupts after disabling a counter.
>>
>> Most likely the interrupt was in flight at the time you disabled it.
>
> I tried to clear the bit in the active_mask after disabling the
> counter (writing to the msr), which did not solve it. Shouldn't the
> counter be disabled immediatly? Maybe clearing the INT bit would have
> been worked too, but I was not sure about side effects.
>
0 instr1
1 instr2
2 instr3
3 wrmsrl(eventsel0, 0);
There is skid between the instruction you overflow the counter and
where the interrupt
is posted. If you overflow on instr1, suppose you post the interrupt
on instr3 which
is immediately followed by disable. There may a chance you get the
interrupt even
though the counter was disabled. I also don't know when the INT bit is
looked at.
It may be worthwhile trying with:
static inline void x86_pmu_disable_event(struct perf_event *event)
{
struct hw_perf_event *hwc = &event->hw;
(void)checking_wrmsrl(hwc->config_base + hwc->idx, 0);
}
to see if it makes a difference.
>> Does the counter value reflect this?
>
> Yes, the disabled bit was cleared after reading the evntsel msr and
> the ctr value have had about 400 cycles (it could have been
> overflowed, though we actually can't say since the counter was
> disabled).
>
>> Were you also getting this if you were only measuring at the user level?
>
> I tried only
>
> perf record ./hackbench 10
>
> which triggered it on my system.
>
I suspect that if you do:
perf record -e cycles:u ./hackbench 10
It does not happen.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists