linux-kernel - Re: [BUG] Core2 cpu triggers hard lockup with perf test

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160301171722.GA2666@krava.redhat.com>
Date:	Tue, 1 Mar 2016 18:17:22 +0100
From:	Jiri Olsa <jolsa@...hat.com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Andi Kleen <andi@...stfloor.org>,
	"Liang, Kan" <kan.liang@...el.com>,
	Arnaldo Carvalho de Melo <acme@...nel.org>,
	Ingo Molnar <mingo@...nel.org>,
	Stephane Eranian <eranian@...gle.com>,
	Wang Nan <wangnan0@...wei.com>,
	"zheng.z.yan@...el.com" <zheng.z.yan@...el.com>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [BUG] Core2 cpu triggers hard lockup with perf test

On Tue, Mar 01, 2016 at 03:59:09PM +0100, Peter Zijlstra wrote:
> On Tue, Mar 01, 2016 at 03:51:05PM +0100, Andi Kleen wrote:
> > > im on it.. also the patch that makes this happen just
> > > enlarge the buffer for PEBS:
> > > 
> > >   156174999dd1 perf/intel/x86: Enlarge the PEBS buffer
> > > 
> > > but I did not find anyaPEBS buffer lenght limitations in SDM
> > 
> > May be the easiest would be to just keep the old buffer size
> > on Merom.
> 
> I would still like to know why it breaks though. Just changing code
> about because it makes fail go away is never the right answer.

I had to go throught several config switch offs,
however now it's pretty clear where it hangs

I got one (just one occurance):

[  125.982977] NMI watchdog: Watchdog detected hard LOCKUP on cpu 1^M
[  125.982977] Modules linked in: rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ocrdma ib_core snd_hda_codec_idt snd_hda_codec_generic ib_addr snd_hda_intel snd_hda_codec snd_hda_core coretemp kvm_intel snd_hwdep snd_seq kvm snd_seq_device snd_pcm snd_timer snd nfsd soundcore shpchp pcspkr auth_rpcgss ppdev nfs_acl sg irqbypass parport_serial parport_pc lockd i2c_i801 parport acpi_cpufreq grace sunrpc ip_tables xfs libcrc32c sr_mod sd_mod cdrom i915 video i2c_algo_bit ahci drm_kms_helper libahci syscopyarea sysfillrect sysimgblt fb_sys_fops e1000e drm ptp pps_core i2c_core lpfc libata scsi_transport_fc be2net vxlan ip6_udp_tunnel udp_tunnel fjes dm_mirror dm_region_hash dm_log dm_mod^M
[  125.982977] CPU: 1 PID: 5586 Comm: perf Not tainted 4.5.0-rc4fix+ #10^M
[  125.982977] Hardware name: System Manufacturer To Be Filled By O.E.M. Product Name To Be Filled By O.E.M./BB Name To be filled by O.E.M., BIOS CGELIA55.86^M
[  125.982977]  0000000000000086 00000000522d7515 ffff88007d885bb0 ffffffff8131dd11^M
[  125.982977]  0000000000000000 0000000000000000 ffff88007d885bc8 ffffffff811279cb^M
[  125.982977]  ffff88007a548000 ffff88007d885c00 ffffffff81168618 0000000000000002^M
[  125.982977] Call Trace:^M
[  125.982977]  <NMI>  [<ffffffff8131dd11>] dump_stack+0x4d/0x6c^M
[  125.982977]  [<ffffffff811279cb>] watchdog_overflow_callback+0xeb/0x100^M
[  125.982977]  [<ffffffff81168618>] __perf_event_overflow+0x88/0x1c0^M
[  125.982977]  [<ffffffff81169194>] perf_event_overflow+0x14/0x20^M
[  125.982977]  [<ffffffff8100b656>] intel_pmu_handle_irq+0x1e6/0x4a0^M
[  125.982977]  [<ffffffff810413bf>] ? native_apic_wait_icr_idle+0x1f/0x30^M
[  125.982977]  [<ffffffff81021d15>] ? arch_irq_work_raise+0x35/0x40^M
[  125.982977]  [<ffffffff8115c54e>] ? irq_work_queue+0x6e/0x80^M
[  125.982977]  [<ffffffff8100466d>] perf_event_nmi_handler+0x2d/0x50^M
[  125.982977]  [<ffffffff81020382>] nmi_handle+0x62/0xf0^M
[  125.982977]  [<ffffffff810209d6>] default_do_nmi+0xe6/0x110^M
[  125.982977]  [<ffffffff81020ad7>] do_nmi+0xd7/0x140^M
[  125.982977]  [<ffffffff8167af37>] end_repeat_nmi+0x1a/0x1e^M
[  125.982977]  [<ffffffff8100ae7b>] ? __intel_pmu_enable_all.isra.11+0x4b/0xd0^M
[  125.982977]  [<ffffffff8100ae7b>] ? __intel_pmu_enable_all.isra.11+0x4b/0xd0^M
[  125.982977]  [<ffffffff8100ae7b>] ? __intel_pmu_enable_all.isra.11+0x4b/0xd0^M
[  125.982977]  <<EOE>>  [<ffffffff8100af10>] intel_pmu_enable_all+0x10/0x20^M
[  125.982977]  [<ffffffff81006283>] x86_pmu_enable+0x263/0x2f0^M
[  125.982977]  [<ffffffff811632d2>] perf_pmu_enable+0x22/0x30^M
[  125.982977]  [<ffffffff81163f51>] ctx_resched+0x51/0x60^M
[  125.982977]  [<ffffffff81164b09>] perf_event_exec+0x109/0x150^M
[  125.982977]  [<ffffffff811fff7d>] setup_new_exec+0x6d/0x1a0^M
[  125.982977]  [<ffffffff8125104a>] load_elf_binary+0x37a/0x10e0^M
[  125.982977]  [<ffffffff811a06c2>] ? get_user_pages+0x52/0x60^M
[  125.982977]  [<ffffffff811fe32e>] search_binary_handler+0x9e/0x1e0^M
[  125.982977]  [<ffffffff811ffccd>] do_execveat_common.isra.37+0x54d/0x6e0^M
[  125.982977]  [<ffffffff812000ea>] SyS_execve+0x3a/0x50^M
[  125.982977]  [<ffffffff81679065>] stub_execve+0x5/0x5^M
[  125.982977]  [<ffffffff81678dd7>] ? entry_SYSCALL_64_fastpath+0x12/0x6a^M


and several rcu stalls.. I guess that's because above path did rcu_read_lock:

[  166.667008] INFO: rcu_sched detected stalls on CPUs/tasks:^M
[  166.668003]  1-...: (16 GPs behind) idle=b13/140000000000000/0 softirq=6921/6923 fqs=19790 ^M
[  166.774020]  (detected by 2, t=60168 jiffies, g=6522, c=6521, q=0)^M
[  166.774020] Task dump for CPU 1:^M
[  166.774020] perf            R  running task        0  5586   5585 0x00000088^M
[  166.774020]  0000000000000001 ffff88007144fd70 ffffffff81164b09 ffff88007d899808^M
[  166.774020]  0000000000000286 ffff88007d899808 ffff880076f8e7c0 ffff880079734200^M
[  166.774020]  ffff880071450000 0000000000000000 ffff880078dfc500 ffff88007144fd90^M
[  166.774020] Call Trace:^M
[  166.774020]  [<ffffffff81164b09>] ? perf_event_exec+0x109/0x150^M
[  166.774020]  [<ffffffff811fff7d>] ? setup_new_exec+0x6d/0x1a0^M
[  166.774020]  [<ffffffff8125104a>] ? load_elf_binary+0x37a/0x10e0^M
[  166.774020]  [<ffffffff811a06c2>] ? get_user_pages+0x52/0x60^M
[  166.774020]  [<ffffffff811fe32e>] ? search_binary_handler+0x9e/0x1e0^M
[  166.774020]  [<ffffffff811ffccd>] ? do_execveat_common.isra.37+0x54d/0x6e0^M
[  166.774020]  [<ffffffff812000ea>] ? SyS_execve+0x3a/0x50^M
[  166.774020]  [<ffffffff81679065>] ? stub_execve+0x5/0x5^M
[  166.774020]  [<ffffffff81678dd7>] ? entry_SYSCALL_64_fastpath+0x12/0x6a^M
[  346.672008] INFO: rcu_sched detected stalls on CPUs/tasks:^M


the exception addr is on wrmsr:

ffffffff8100ae30 <__intel_pmu_enable_all.isra.11>:
ffffffff8100ae30:       e8 bb 02 67 00          callq  ffffffff8167b0f0 <__fentry__>
ffffffff8100ae35:       55                      push   %rbp
ffffffff8100ae36:       48 89 e5                mov    %rsp,%rbp
ffffffff8100ae39:       41 54                   push   %r12
ffffffff8100ae3b:       41 89 fc                mov    %edi,%r12d
ffffffff8100ae3e:       53                      push   %rbx
ffffffff8100ae3f:       48 c7 c3 80 a3 00 00    mov    $0xa380,%rbx
ffffffff8100ae46:       65 48 03 1d d2 f2 ff    add    %gs:0x7efff2d2(%rip),%rbx        # a120 <this_cpu_off>
ffffffff8100ae4d:       7e
ffffffff8100ae4e:       e8 6d 49 00 00          callq  ffffffff8100f7c0 <intel_pmu_pebs_enable_all>
ffffffff8100ae53:       41 0f b6 fc             movzbl %r12b,%edi
ffffffff8100ae57:       e8 94 58 00 00          callq  ffffffff810106f0 <intel_pmu_lbr_enable_all>
ffffffff8100ae5c:       48 8b 83 68 0c 00 00    mov    0xc68(%rbx),%rax
ffffffff8100ae63:       b9 8f 03 00 00          mov    $0x38f,%ecx
ffffffff8100ae68:       48 f7 d0                not    %rax
ffffffff8100ae6b:       48 23 05 26 80 ad 00    and    0xad8026(%rip),%rax        # ffffffff81ae2e98 <x86_pmu+0x138>
ffffffff8100ae72:       48 89 c2                mov    %rax,%rdx
ffffffff8100ae75:       48 c1 ea 20             shr    $0x20,%rdx
ffffffff8100ae79:       0f 30                   wrmsr


I tried what Andy suggested below (not sure what he meant by Merom,
I took PEBS format0 instead), works for me

jirka


---
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index c8a243d6fc82..c4a1a769bae7 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -269,7 +269,7 @@ static int alloc_pebs_buffer(int cpu)
 	if (!x86_pmu.pebs)
 		return 0;
 
-	buffer = kzalloc_node(PEBS_BUFFER_SIZE, GFP_KERNEL, node);
+	buffer = kzalloc_node(x86_pmu.pebs_buffer_size, GFP_KERNEL, node);
 	if (unlikely(!buffer))
 		return -ENOMEM;
 
@@ -286,7 +286,7 @@ static int alloc_pebs_buffer(int cpu)
 		per_cpu(insn_buffer, cpu) = ibuffer;
 	}
 
-	max = PEBS_BUFFER_SIZE / x86_pmu.pebs_record_size;
+	max = x86_pmu.pebs_buffer_size / x86_pmu.pebs_record_size;
 
 	ds->pebs_buffer_base = (u64)(unsigned long)buffer;
 	ds->pebs_index = ds->pebs_buffer_base;
@@ -1319,6 +1319,7 @@ void __init intel_ds_init(void)
 
 	x86_pmu.bts  = boot_cpu_has(X86_FEATURE_BTS);
 	x86_pmu.pebs = boot_cpu_has(X86_FEATURE_PEBS);
+	x86_pmu.pebs_buffer_size = PEBS_BUFFER_SIZE;
 	if (x86_pmu.pebs) {
 		char pebs_type = x86_pmu.intel_cap.pebs_trap ?  '+' : '-';
 		int format = x86_pmu.intel_cap.pebs_format;
@@ -1327,6 +1328,7 @@ void __init intel_ds_init(void)
 		case 0:
 			pr_cont("PEBS fmt0%c, ", pebs_type);
 			x86_pmu.pebs_record_size = sizeof(struct pebs_record_core);
+			x86_pmu.pebs_buffer_size = PAGE_SIZE;
 			x86_pmu.drain_pebs = intel_pmu_drain_pebs_core;
 			break;
 
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 7bb61e32fb29..1ab6279fed1d 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -586,6 +586,7 @@ struct x86_pmu {
 			pebs_broken	:1,
 			pebs_prec_dist	:1;
 	int		pebs_record_size;
+	int		pebs_buffer_size;
 	void		(*drain_pebs)(struct pt_regs *regs);
 	struct event_constraint *pebs_constraints;
 	void		(*pebs_aliases)(struct perf_event *event);