lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160301091703.GN6356@twins.programming.kicks-ass.net>
Date:	Tue, 1 Mar 2016 10:17:03 +0100
From:	Peter Zijlstra <peterz@...radead.org>
To:	"Liang, Kan" <kan.liang@...el.com>
Cc:	Jiri Olsa <jolsa@...hat.com>,
	Arnaldo Carvalho de Melo <acme@...nel.org>,
	Ingo Molnar <mingo@...nel.org>,
	Andi Kleen <andi@...stfloor.org>,
	Stephane Eranian <eranian@...gle.com>,
	Wang Nan <wangnan0@...wei.com>,
	"zheng.z.yan@...el.com" <zheng.z.yan@...el.com>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [BUG] Core2 cpu triggers hard lockup with perf test

On Mon, Feb 29, 2016 at 10:12:08PM +0000, Liang, Kan wrote:

> In SDM "18.4.4.4 Re-configuring PEBS Facilities" it mentioned that
> a quiescent period is needed between stopping the prior event counting and
> setting up a new PEBS event when software needs to reconfigure PEBS facilities.
> The quiescent period is to allow any latent residual PEBS records to complete
> its capture at their previously specified buffer address

> That requirement only can be found in Core Microarchitecture. 

But that should apply to all (PEBS) event scheduling, not just the
multi thing.

Also very convenient that quiescent period is so well defined. How long
should we wait, a day?

> I think it may implies that there is some observed delay in writing PEBS buffer.

Doesn't it explicitly state just that?

> So if perf record precise hw event with very small period, the slow PEBS writing
> may lockup the CPU. 

And I still don't see how this would explain a lockup in the MSR writes.

[ Jiri, can you disable that stupid panic on hard lockup and let it run
for a while, see if all the lockup msgs hit the same IP? Also, can you
look where exactly that IP lives in the code? ]

So I suspect it actually just did the PERF_GLOBAL_CTRL write, how else
would the hardware watchdog trigger on that same CPU.

After that, there's only BTS muck, which you're not using, so WTH is it
actually stuck on?

> If so, I think disabling the multiple pebs should be a good way.

As said, this should affect any and all PEBS event scheduling, not just
the multi stuff.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ