linux-kernel - Re: [PATCH -v3] perf, x86: try to handle unknown nmis with running perfctrs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20100827140523.GM22783@erda.amd.com>
Date:	Fri, 27 Aug 2010 16:05:23 +0200
From:	Robert Richter <robert.richter@....com>
To:	Don Zickus <dzickus@...hat.com>
CC:	Ingo Molnar <mingo@...e.hu>, Peter Zijlstra <peterz@...radead.org>,
	Cyrill Gorcunov <gorcunov@...il.com>,
	Lin Ming <ming.m.lin@...el.com>,
	"fweisbec@...il.com" <fweisbec@...il.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"Huang, Ying" <ying.huang@...el.com>,
	Yinghai Lu <yinghai@...nel.org>,
	Andi Kleen <andi@...stfloor.org>
Subject: Re: [PATCH -v3] perf, x86: try to handle unknown nmis with running
 perfctrs

On 27.08.10 09:44:29, Don Zickus wrote:
> On Fri, Aug 27, 2010 at 10:10:38AM +0200, Robert Richter wrote:
> > On 26.08.10 17:14:24, Don Zickus wrote:
> > > diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
> > > index 4539b4b..d16ebd8 100644
> > > --- a/arch/x86/kernel/cpu/perf_event_intel.c
> > > +++ b/arch/x86/kernel/cpu/perf_event_intel.c
> > > @@ -738,6 +738,7 @@ again:
> > >  
> > >  	inc_irq_stat(apic_perf_irqs);
> > >  	ack = status;
> > > +	intel_pmu_ack_status(ack);
> > 
> > I would slightly change the patch:
> > 
> > There is no need for the ack variable anymore, you could directly work
> > with the status.
> > 
> > I would call intel_pmu_ack_status() as close as possible after the
> > intel_pmu_get_status(), which is after 'again:'.
> 
> Yeah, I can do that.  The other patch was just a proof of concept to see
> what others thought.
> 
> What is funny is that this problem was masked by the
> perf_event_nmi_handler swallowing all the nmis.  I wonder if we were
> losing events as a result of this bug too because if you think about it,
> we processed the first event, a second event came in and we accidentally
> ack'd it, thus dropping it on the floor.

Yes, this could be the case, but only for handled counters. So it
would be interesting to see for this case the status mask of the
current and previous get_status call.

> Now I wonder how the event was
> ever reloaded, unless it was by accident because of how the scheduler
> deals with perf counters (perf_start/stop all the time).

The nmi might be queued be the cpu regardless of of the overflow
state.

I am wondering why this happens at all, because events are disabled by
wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0). Hmm, maybe this is exactly the
reason because the nmi could fire again after reenabling the counters.

Is there a reason for disabling all counters?

-Robert

> 
> Cheers,
> Don
> 

-- 
Advanced Micro Devices, Inc.
Operating System Research Center

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/