[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.11.1505080018140.26907@vincent-weaver-1.umelst.maine.edu>
Date: Fri, 8 May 2015 00:22:07 -0400 (EDT)
From: Vince Weaver <vincent.weaver@...ne.edu>
To: Ingo Molnar <mingo@...nel.org>
cc: Vince Weaver <vincent.weaver@...ne.edu>,
linux-kernel@...r.kernel.org,
Peter Zijlstra <peterz@...radead.org>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Jiri Olsa <jolsa@...hat.com>, Ingo Molnar <mingo@...hat.com>,
Paul Mackerras <paulus@...ba.org>
Subject: Re: perf: WARNING perfevents: irq loop stuck!
On Fri, 1 May 2015, Ingo Molnar wrote:
> So 0000fffffffffffe corresponds to 2 events left until overflow,
> right? And on Haswell we don't set x86_pmu.limit_period AFAICS, so we
> allow these super short periods.
>
> Maybe like on Broadwell we need a quirk on Nehalem/Haswell as well,
> one similar to bdw_limit_period()? Something like the patch below?
>
> Totally untested and such. I picked 128 because of Broadwell, but
> lower values might work as well. You could try to increase it to 3 and
> upwards and see which one stops triggering stuck NMI loops?
I spent a lot of time trying to come up with a test case that triggered
this more reliably but failed.
It definitely is an issue with PMC0 being -2 causing the PMC0 bit in the
status register getting stuck and no clearing. Often there is also a PEBS
event active at the same time but that might be coincidence.
With your patch applied I can't trigger the issue. I haven't tried
narrowing down the exact value yet.
Vince
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists