lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Thu, 18 Jun 2020 20:33:02 +0200 From: Peter Zijlstra <peterz@...radead.org> To: Song Liu <songliubraving@...com> Cc: linux-kernel <linux-kernel@...r.kernel.org>, Kan Liang <kan.liang@...ux.intel.com>, Alexander Shishkin <alexander.shishkin@...ux.intel.com>, Andi Kleen <ak@...ux.intel.com>, Josh Poimboeuf <jpoimboe@...hat.com>, Tejun Heo <tj@...nel.org>, Andrii Nakryiko <andriin@...com>, Francis Ricci <fjricci@...com> Subject: Re: WARNING with LBR + precise_ip=2 + bpf_get_stackid() On Thu, Jun 18, 2020 at 05:25:24PM +0000, Song Liu wrote: > PID: 80430 TASK: ffff888d92c62a80 CPU: 24 COMMAND: "25_scheduler" > #0 [fffffe00004cfd88] machine_kexec at ffffffff8104a646 > #1 [fffffe00004cfdd8] __crash_kexec at ffffffff8114a82f > #2 [fffffe00004cfea0] panic at ffffffff810ba99c > #3 [fffffe00004cff20] df_debug at ffffffff8104e21d > #4 [fffffe00004cff30] do_double_fault at ffffffff8101c9b4 > #5 [fffffe00004cff50] double_fault at ffffffff81c00b3e > [exception RIP: vsnprintf+14] > RIP: ffffffff81a3ee6e RSP: fffffe00004d0ff8 RFLAGS: 00010082 > RAX: fffffe00004d1070 RBX: fffffe00004d1101 RCX: fffffe00004d1050 > RDX: ffffffff822c42f5 RSI: 000000007fffffff RDI: fffffe00004d111a > RBP: fffffe00004d10a0 R8: 0000000000000067 R9: ffffffff82209d05 > R10: ffffffff822a5fd0 R11: ffffffff822a6358 R12: 0000000000000019 > R13: 0000000000000001 R14: 0000000000000019 R15: ffffffff822b5fd6 > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 > --- <DOUBLEFAULT exception stack> --- > #6 [fffffe00004d0ff8] vsnprintf at ffffffff81a3ee6e > bt: cannot transition from exception stack to current process stack: > exception stack pointer: fffffe00004cfd88 > process stack pointer: fffffe00004d1048 > current stack base: ffffc900241a0000 > > 0xfffffe00004d1040 kallsyms_token_index > 0xfffffe00004d1048 sprintf > 0xfffffe00004d1078 kallsyms_lookup > 0xfffffe00004d1098 kallsyms_names > 0xfffffe00004d10a8 __sprint_symbol > 0xfffffe00004d10d8 textbuf.47672 > 0xfffffe00004d10e0 always_kmsg_dump > 0xfffffe00004d10f8 symbol_string > 0xfffffe00004d11c0 textbuf.47672 > 0xfffffe00004d11e8 textbuf.47672 > 0xfffffe00004d11f8 textbuf.47672 > 0xfffffe00004d1200 always_kmsg_dump > 0xfffffe00004d1210 kallsyms_token_index > 0xfffffe00004d1218 vsnprintf > 0xfffffe00004d1220 textbuf.47672 > 0xfffffe00004d1258 kallsyms_token_index > 0xfffffe00004d1270 vscnprintf > 0xfffffe00004d1280 vprintk_store > 0xfffffe00004d1290 wake_up_klogd > 0xfffffe00004d12b0 kallsyms_token_index > 0xfffffe00004d12c8 vprintk_emit > 0xfffffe00004d1300 entry_SYSCALL_64 > 0xfffffe00004d1318 vprintk_deferred > 0xfffffe00004d1328 printk_deferred > 0xfffffe00004d1360 entry_SYSCALL_64 > 0xfffffe00004d1380 __start_orc_unwind > 0xfffffe00004d1388 unwind_next_frame.cold.7 > 0xfffffe00004d13c8 perf_callchain_kernel > 0xfffffe00004d1418 entry_SYSCALL_64 > 0xfffffe00004d1450 get_perf_callchain > 0xfffffe00004d14b0 bpf_get_stack This, this is where it gets buggerd. This is a BPF problem, not a perf problem. You'll note that if you ask perf for a PERF_SAMPLE_CALLCHAIN we'll do that early in setup_pebs_fixed_sample_data(). > 0xfffffe00004d1730 bpf_overflow_handler > 0xfffffe00004d1788 __perf_event_overflow > 0xfffffe00004d17a0 x86_pmu > 0xfffffe00004d17b8 __intel_pmu_pebs_event > 0xfffffe00004d17e0 setup_pebs_fixed_sample_data > 0xfffffe00004d1890 entry_SYSCALL_64 > 0xfffffe00004d1ab8 intel_pmu_drain_pebs_nhm > 0xfffffe00004d1ac0 setup_pebs_fixed_sample_data > 0xfffffe00004d1bb8 handle_pmi_common > 0xfffffe00004d1d00 insn_get_sib.part.5 > 0xfffffe00004d1d10 insn_get_displacement.part.6 > 0xfffffe00004d1d20 insn_get_immediate.part.7 > 0xfffffe00004d1d38 __set_pte_vaddr > 0xfffffe00004d1d78 __native_set_fixmap > 0xfffffe00004d1d80 ghes_copy_tofrom_phys > 0xfffffe00004d1e08 intel_pmu_handle_irq > 0xfffffe00004d1e48 perf_event_nmi_handler > 0xfffffe00004d1e50 perf_event_nmi_handler_na.47078 > 0xfffffe00004d1e58 nmi_reason_lock > 0xfffffe00004d1e60 nmi_handle > 0xfffffe00004d1e70 nmi_desc > 0xfffffe00004d1e90 nmi_reason_lock > 0xfffffe00004d1eb8 default_do_nmi > 0xfffffe00004d1ed8 do_nmi > 0xfffffe00004d1ef0 nmi > 0xfffffe00004d1f78 entry_SYSCALL_64 > 0xfffffe00004d1fa0 entry_SYSCALL_64 > 0xfffffe00004d1fd8 entry_SYSCALL_64 > > The kernel is running the normal NMI path and triggers the above mentioned warnings, > which calls into string formatting path for the warning messages. > However, NMIs run > off of its dedicated exception stack which is 4k in size, which is too small for the > full string formatting path to run, Say what ?! Has printk() grown into such a freakish monster that it cannot run on 4k stacks anymore? I suppose that KSYM_SYMBOL_LEN on-stack buffer in symbol_string() doesn't help. > We noticed that this only happens with precise_ip >= 2. This is caused by > setup_pebs_fixed_sample_data() using pens->real_ip: > > set_linear_ip(regs, pebs->real_ip); > > For our use case, we do need precise_ip=2. So we would like suggestions to fix the > warning and/or to avoid double fault. It all works as advertised. The BPF program is broken for calling bpf_get_stack() on a PEBS event.
Powered by blists - more mailing lists