lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100827140523.GM22783@erda.amd.com>
Date:	Fri, 27 Aug 2010 16:05:23 +0200
From:	Robert Richter <robert.richter@....com>
To:	Don Zickus <dzickus@...hat.com>
CC:	Ingo Molnar <mingo@...e.hu>, Peter Zijlstra <peterz@...radead.org>,
	Cyrill Gorcunov <gorcunov@...il.com>,
	Lin Ming <ming.m.lin@...el.com>,
	"fweisbec@...il.com" <fweisbec@...il.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"Huang, Ying" <ying.huang@...el.com>,
	Yinghai Lu <yinghai@...nel.org>,
	Andi Kleen <andi@...stfloor.org>
Subject: Re: [PATCH -v3] perf, x86: try to handle unknown nmis with running
 perfctrs

On 27.08.10 09:44:29, Don Zickus wrote:
> On Fri, Aug 27, 2010 at 10:10:38AM +0200, Robert Richter wrote:
> > On 26.08.10 17:14:24, Don Zickus wrote:
> > > diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
> > > index 4539b4b..d16ebd8 100644
> > > --- a/arch/x86/kernel/cpu/perf_event_intel.c
> > > +++ b/arch/x86/kernel/cpu/perf_event_intel.c
> > > @@ -738,6 +738,7 @@ again:
> > >  
> > >  	inc_irq_stat(apic_perf_irqs);
> > >  	ack = status;
> > > +	intel_pmu_ack_status(ack);
> > 
> > I would slightly change the patch:
> > 
> > There is no need for the ack variable anymore, you could directly work
> > with the status.
> > 
> > I would call intel_pmu_ack_status() as close as possible after the
> > intel_pmu_get_status(), which is after 'again:'.
> 
> Yeah, I can do that.  The other patch was just a proof of concept to see
> what others thought.
> 
> What is funny is that this problem was masked by the
> perf_event_nmi_handler swallowing all the nmis.  I wonder if we were
> losing events as a result of this bug too because if you think about it,
> we processed the first event, a second event came in and we accidentally
> ack'd it, thus dropping it on the floor.

Yes, this could be the case, but only for handled counters. So it
would be interesting to see for this case the status mask of the
current and previous get_status call.

> Now I wonder how the event was
> ever reloaded, unless it was by accident because of how the scheduler
> deals with perf counters (perf_start/stop all the time).

The nmi might be queued be the cpu regardless of of the overflow
state.

I am wondering why this happens at all, because events are disabled by
wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0). Hmm, maybe this is exactly the
reason because the nmi could fire again after reenabling the counters.

Is there a reason for disabling all counters?

-Robert

> 
> Cheers,
> Don
> 

-- 
Advanced Micro Devices, Inc.
Operating System Research Center

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ