lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.11.1505011316420.2300@vincent-weaver-1.umelst.maine.edu>
Date:	Fri, 1 May 2015 13:20:17 -0400 (EDT)
From:	Vince Weaver <vincent.weaver@...ne.edu>
To:	Ingo Molnar <mingo@...nel.org>
cc:	Vince Weaver <vincent.weaver@...ne.edu>,
	linux-kernel@...r.kernel.org,
	Peter Zijlstra <peterz@...radead.org>,
	Arnaldo Carvalho de Melo <acme@...nel.org>,
	Jiri Olsa <jolsa@...hat.com>, Ingo Molnar <mingo@...hat.com>,
	Paul Mackerras <paulus@...ba.org>
Subject: Re: perf: WARNING perfevents: irq loop stuck!

On Fri, 1 May 2015, Ingo Molnar wrote:

> 
> * Vince Weaver <vincent.weaver@...ne.edu> wrote:
> 
> > So this is just a warning, and I've reported it before, but the 
> > perf_fuzzer triggers this fairly regularly on my Haswell system.
> > 
> > It looks like fixed counter 0 (retired instructions) being set to 
> > 0000fffffffffffe occasionally causes an irq loop storm and gets 
> > stuck until the PMU state is cleared.
> 
> So 0000fffffffffffe corresponds to 2 events left until overflow, 
> right? And on Haswell we don't set x86_pmu.limit_period AFAICS, so we 
> allow these super short periods.
> 
> Maybe like on Broadwell we need a quirk on Nehalem/Haswell as well, 
> one similar to bdw_limit_period()? Something like the patch below?

I spent the morning trying to get a reproducer for this.  It turns out to 
be complex.  It seems in addition to fixed counter 0 being set to -2, at 
least one other non-fixed counter must be about to overflow.

For example, in this case gen-PMC2 is also poised to overflow at the same 
time.

CPU#0:	 gen-PMC2 ctrl:		00000003ff96764b
CPU#0:   gen-PMC2 count:	0000000000000001
gen-PMC2 left:			0000ffffffffffff
...
[ 2408.612442] CPU#0: fixed-PMC0 count: 0000fffffffffffe


It's not always PMC2 but in the warnings there's at least one other 
gen-PMC about to overflow at the exact same time as the fixed one.

Vince
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ