[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170103152600.GA8826@krava>
Date: Tue, 3 Jan 2017 16:26:00 +0100
From: Jiri Olsa <jolsa@...hat.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Jiri Olsa <jolsa@...nel.org>, lkml <linux-kernel@...r.kernel.org>,
Ingo Molnar <mingo@...nel.org>,
Andi Kleen <andi@...stfloor.org>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Vince Weaver <vince@...ter.net>,
Stephane Eranian <eranian@...gle.com>
Subject: Re: [PATCH 2/4] perf/x86: Fix period for non sampling events
On Tue, Jan 03, 2017 at 04:09:46PM +0100, Peter Zijlstra wrote:
> On Wed, Dec 28, 2016 at 02:31:04PM +0100, Jiri Olsa wrote:
> > When in counting mode we setup the counter with the
> > longest possible period and read the value with read
> > syscall.
> >
> > We also still setup the PMI to be triggered when such
> > counter overflow to reconfigure it.
> >
> > We also get PEBS interrupt if such counter has precise_ip
> > set (which makes no sense, but it's possible).
> >
> > Having such counter with:
> > - counting mode
> > - precise_ip set
> >
> > I watched my server to get stuck serving PEBS interrupt
> > again and again because of following (AFAICS):
> >
> > - PEBS interrupt is triggered before PMI
>
> Slightly confused, the PEBS interrupt _is_ the PMI. And how can we get
> an interrupt before the counter overflows?
>
> > - when PEBS handling path reconfigured counter it
> > had remaining value of -256
>
> You're talking about the actual counter value, right, not @left?
right
>
> > - the x86_perf_event_set_period does not consider this
> > as an extreme value, so it's configured back as the
> > new counter value
>
> Right, a counter value of -256 would result in @left being 256 which is
> positive and not too large, so we 'retain' the value.
>
> > - this makes the PEBS interrupt to be triggered right
> > away again
>
> So I'm curious how this is even possible. The normal described way of
> things is:
>
> - we program the counter with a negative value
> - each 'event' does a counter increment
> - if we 'overflow' (turn positive) we start to arm the PEBS
> assist
heh, I guess I thought this could happen earlier ;-) otherwise
I dont get how could I saw -256 left in the counter value..
> - once the assist is armed, the next 'event' triggers a PEBS
> record.
> - if the amount of PEBS records exceeds the DS threshold, we
> set bit 62 in GLOBAL_STATUS and raise the PMI.
>
> At which point the actual counter value should be at the very least 1
> (for having counted the event that triggers the PEBS assist into
> creating the record).
what I saw was the bit 62 set and pebs_drain->__intel_pmu_pebs_event
re-configuring the event back with -256 again and again..
I'll run fuzzer again without the fix with my debug stuff in and try
to recreate ;-)
> Did your kernel include commit:
>
> daa864b8f8e3 ("perf/x86/pebs: Fix handling of PEBS buffer overflows")
yep
jirka
Powered by blists - more mailing lists