netdev - Re: [PATCH] perf: fix panic by disable ftrace on fault.c

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <18252e42-9c30-73d4-e3bb-0e705a78af41@intel.com>
Date:   Tue, 14 Sep 2021 09:16:53 -0700
From:   Dave Hansen <dave.hansen@...el.com>
To:     王贇 <yun.wang@...ux.alibaba.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Andy Lutomirski <luto@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        "maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)" <x86@...nel.org>,
        "H. Peter Anvin" <hpa@...or.com>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Andrii Nakryiko <andrii@...nel.org>,
        Martin KaFai Lau <kafai@...com>,
        Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
        John Fastabend <john.fastabend@...il.com>,
        KP Singh <kpsingh@...nel.org>,
        "open list:X86 MM" <linux-kernel@...r.kernel.org>,
        "open list:BPF (Safe dynamic programs and tools)" 
        <netdev@...r.kernel.org>,
        "open list:BPF (Safe dynamic programs and tools)" 
        <bpf@...r.kernel.org>
Subject: Re: [PATCH] perf: fix panic by disable ftrace on fault.c

On 9/14/21 12:23 AM, 王贇 wrote:
> 
> On 2021/9/14 上午11:02, 王贇 wrote:
> [snip]
>> [   44.133509][    C0] traps: PANIC: double fault, error_code: 0x0
>> [   44.133519][    C0] double fault: 0000 [#1] SMP PTI
>> [   44.133526][    C0] CPU: 0 PID: 743 Comm: a.out Not tainted 5.14.0-next-20210913 #469
>> [   44.133532][    C0] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
>> [   44.133536][    C0] RIP: 0010:perf_swevent_get_recursion_context+0x0/0x70
>> [   44.133549][    C0] Code: 48 03 43 28 48 8b 0c 24 bb 01 00 00 00 4c 29 f0 48 39 c8 48 0f 47 c1 49 89 45 08 e9 48 ff ff ff 66 2e 0f 1f 84 00 00 00 00 00 <55> 53 e8 09 20 f2 ff 48 c7 c2 20 4d 03 00 65 48 03 15 5a 3b d2 7e
>> [   44.133556][    C0] RSP: 0018:fffffe000000b000 EFLAGS: 00010046
> Another information is that I have printed '__this_cpu_ist_bottom_va(NMI)'
> on cpu0, which is just the RSP fffffe000000b000, does this imply
> we got an overflowed NMI stack?

Yep.  I have the feeling some of your sanitizer and other debugging is
eating the stack:

> [   44.134987][    C0]  ? __sanitizer_cov_trace_pc+0x7/0x60
> [   44.135005][    C0]  ? kcov_common_handle+0x30/0x30

Just turning off tracing for the page fault handler is papering over the
problem.  It'll just come back later with a slightly different form.