linux-kernel - Re: [BUG] Core2 cpu triggers hard lockup with perf test

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160301065520.GB622@krava.redhat.com>
Date:	Tue, 1 Mar 2016 07:55:20 +0100
From:	Jiri Olsa <jolsa@...hat.com>
To:	"Liang, Kan" <kan.liang@...el.com>
Cc:	Arnaldo Carvalho de Melo <acme@...nel.org>,
	Ingo Molnar <mingo@...nel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Andi Kleen <andi@...stfloor.org>,
	Stephane Eranian <eranian@...gle.com>,
	Wang Nan <wangnan0@...wei.com>,
	"zheng.z.yan@...el.com" <zheng.z.yan@...el.com>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [BUG] Core2 cpu triggers hard lockup with perf test

On Mon, Feb 29, 2016 at 10:12:08PM +0000, Liang, Kan wrote:
> 
> 
> > 
> > I can't find what's special about Core2 CPU PEBS setup, it seems that oher
> > CPUs are ok (tried on ivb/snb/hsw).
> > 
> > reverting the 156174999dd1 fixed the issue for me
> > 
> > ideas? thanks,
> 
> I think we may just disable the multiple pebs support for core2
> as the patch below.
> 
> In SDM "18.4.4.4 Re-configuring PEBS Facilities" it mentioned that
> a quiescent period is needed between stopping the prior event counting and
> setting up a new PEBS event when software needs to reconfigure PEBS facilities.
> The quiescent period is to allow any latent residual PEBS records to complete
> its capture at their previously specified buffer address
> That requirement only can be found in Core Microarchitecture. 
> 
> I think it may implies that there is some observed delay in writing PEBS buffer.
> So if perf record precise hw event with very small period, the slow PEBS writing
> may lockup the CPU. If so, I think disabling the multiple pebs should be a good
> way.
> 
> 

hi,
got same lockup with the patch:


[  167.486514] Kernel panic - not syncing: Hard LOCKUP
[  167.486514] CPU: 3 PID: 10656 Comm: perf Not tainted 4.5.0-rc4+ #7
[  167.486514] Hardware name: System Manufacturer To Be Filled By O.E.M. Product Name To Be Filled By O.E.M./BB Name To be filled by O.E.M., BIOS CGELIA55.86
[  167.486514]  0000000000000086 0000000084986595 ffff88007d985b28 ffffffff8133983f
[  167.486514]  ffffffff8191b723 0000000000000000 ffff88007d985ba8 ffffffff811872d1
[  167.486514]  ffff880000000008 ffff88007d985bb8 ffff88007d985b58 0000000084986595
[  167.486514] Call Trace:
[  167.486514]  <NMI>  [<ffffffff8133983f>] dump_stack+0x63/0x84
[  167.486514]  [<ffffffff811872d1>] panic+0xe2/0x229
[  167.486514]  [<ffffffff8113dc30>] watchdog_overflow_callback+0x100/0x100
[  167.486514]  [<ffffffff8117ee18>] __perf_event_overflow+0x88/0x1c0
[  167.486514]  [<ffffffff8117f994>] perf_event_overflow+0x14/0x20
[  167.486514]  [<ffffffff8100c42f>] intel_pmu_handle_irq+0x1df/0x460
[  167.486514]  [<ffffffff81052e3f>] ? native_apic_wait_icr_idle+0x1f/0x30
[  167.486514]  [<ffffffff81032cc5>] ? arch_irq_work_raise+0x35/0x40
[  167.486514]  [<ffffffff8100563d>] perf_event_nmi_handler+0x2d/0x50
[  167.486514]  [<ffffffff810313a2>] nmi_handle+0x62/0xf0
[  167.486514]  [<ffffffff81031a06>] default_do_nmi+0xf6/0x120
[  167.486514]  [<ffffffff81031b11>] do_nmi+0xe1/0x150
[  167.486514]  [<ffffffff816ad5f1>] end_repeat_nmi+0x1a/0x1e
[  167.486514]  [<ffffffff81063a16>] ? native_write_msr_safe+0x6/0x30
[  167.486514]  [<ffffffff81063a16>] ? native_write_msr_safe+0x6/0x30
[  167.486514]  [<ffffffff81063a16>] ? native_write_msr_safe+0x6/0x30
[  167.486514]  <<EOE>>  [<ffffffff8100b5cd>] ? __intel_pmu_enable_all.isra.12+0x4d/0xb0
[  167.486514]  [<ffffffff8100b640>] intel_pmu_enable_all+0x10/0x20
[  167.486514]  [<ffffffff810072c3>] x86_pmu_enable+0x263/0x2f0
[  167.486514]  [<ffffffff81179a72>] perf_pmu_enable+0x22/0x30
[  167.486514]  [<ffffffff8117a721>] ctx_resched+0x51/0x60
[  167.486514]  [<ffffffff8117b2ff>] perf_event_exec+0x10f/0x140
[  167.486514]  [<ffffffff8121949d>] setup_new_exec+0x6d/0x1a0
[  167.486514]  [<ffffffff8126b58a>] load_elf_binary+0x37a/0x10e0
[  167.486514]  [<ffffffff811b77f2>] ? get_user_pages+0x52/0x60
[  167.486514]  [<ffffffff8121779e>] search_binary_handler+0x9e/0x1e0
[  167.486514]  [<ffffffff812191f4>] do_execveat_common.isra.34+0x554/0x6e0
[  167.486514]  [<ffffffff8121960a>] SyS_execve+0x3a/0x50
[  167.486514]  [<ffffffff816ab195>] stub_execve+0x5/0x5
[  167.486514]  [<ffffffff816aaeee>] ? entry_SYSCALL_64_fastpath+0x12/0x71


jirka