linux-kernel - Re: [PATCH 3/3] compiler: inline does not imply notrace

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2CFF9131-48E9-44D3-93CA-976C47106092@vmware.com>
Date:   Tue, 29 Nov 2022 04:25:38 +0000
From:   Nadav Amit <namit@...are.com>
To:     Steven Rostedt <rostedt@...dmis.org>
CC:     Peter Zijlstra <peterz@...radead.org>,
        Arnd Bergmann <arnd@...db.de>,
        Thomas Gleixner <tglx@...utronix.de>,
        "linux-arm-kernel@...ts.infradead.org" 
        <linux-arm-kernel@...ts.infradead.org>,
        kernel list <linux-kernel@...r.kernel.org>,
        "linux-ia64@...r.kernel.org" <linux-ia64@...r.kernel.org>,
        "linux-um@...ts.infradead.org" <linux-um@...ts.infradead.org>,
        Linux-Arch <linux-arch@...r.kernel.org>,
        Linux-MM <linux-mm@...ck.org>, Andy Lutomirski <luto@...nel.org>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        X86 ML <x86@...nel.org>, Richard Weinberger <richard@....at>,
        Anton Ivanov <anton.ivanov@...bridgegreys.com>,
        Johannes Berg <johannes@...solutions.net>,
        Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH 3/3] compiler: inline does not imply notrace

On Nov 28, 2022, at 8:15 PM, Steven Rostedt <rostedt@...dmis.org> wrote:

> !! External Email
> 
> On Tue, 29 Nov 2022 02:36:22 +0000
> Nadav Amit <namit@...are.com> wrote:
> 
>> On Nov 22, 2022, at 12:51 PM, Nadav Amit <namit@...are.com> wrote:
>> 
>>> But more importantly, the current “inline”->”notrace” solution just papers
>>> over missing “notrace” annotations. Anyone can remove the “inline” at any
>>> given moment since there is no direct (or indirect) relationship between
>>> “inline” and “notrace”. It seems to me all random and bound to fail at some
>>> point.
>> 
>> Peter, Steven, (and others),
>> 
>> Beyond the issues that are addressed in this patch-set, I encountered one
>> more, which reiterates the fact that the heuristics of marking “inline”
>> functions as “notrace” is not good enough.
>> 
>> Before I send a patch, I would like to get your feedback. I include a splat
>> below. It appeaers the execution might get stuck since some functions that
>> can be used for function tracing can be traced themselves.
>> 
>> For example, __kernel_text_address() and unwind_get_return_address() are
>> traceable. I think that we need to disallow the tracing of all functions
>> that are called directly and indirectly from function_stack_trace_call()
>> (i.e., they are in the dynamic extent of function_stack_trace_call).
> 
> How did this happen. It should be able to handle recursion:
> 
> static void
> function_stack_trace_call(unsigned long ip, unsigned long parent_ip,
>                          struct ftrace_ops *op, struct ftrace_regs *fregs)
> {
>        struct trace_array *tr = op->private;
>        struct trace_array_cpu *data;
>        unsigned long flags;
>        long disabled;
>        int cpu;
>        unsigned int trace_ctx;
> 
>        if (unlikely(!tr->function_enabled))
>                return;
> 
>        /*
>         * Need to use raw, since this must be called before the
>         * recursive protection is performed.
>         */
>        local_irq_save(flags);
>        cpu = raw_smp_processor_id();
>        data = per_cpu_ptr(tr->array_buffer.data, cpu);
>        disabled = atomic_inc_return(&data->disabled);
> 
>        if (likely(disabled == 1)) { <<<---- This stops recursion
> 
>                trace_ctx = tracing_gen_ctx_flags(flags);
>                trace_function(tr, ip, parent_ip, trace_ctx);
>                __trace_stack(tr, trace_ctx, STACK_SKIP);
>        }
> 
>        atomic_dec(&data->disabled);
>        local_irq_restore(flags);
> }
> 
> Each of the stack trace functions may recurse back into this function, but
> it will not recurse further. How did it crash?

Ugh. Thanks. I didn’t know that - so your input is really helpful.

I will need to further debug it, but this issue does not occur every time.

The kernel didn’t crash exactly - it’s more of a deadlock. I have lockdep
enabled, so it is not a deadlock that lockdep knows. Could it be that
somehow things just slowed down due to IPIs and mostly-disabled IRQs? I have
no idea. I would need to recreate the scenario. 

For the record, I tried to saved some details in the previous email. It was
kind of hard to understand what’s going on on the other cores, since the
trace of other cores was interleaved. I extract the parts from that I think
the refer to the another CPU (yes, the output is really slow, as seen in the
timestamps):

[531413.923628] Code: 00 00 31 c0 eb f1 0f 1f 80 00 00 00 00 e8 1b 2e 16 3e 55 48 89 e5 c6 07 00 0f 1f 00 f7 c6 00 02 00 00 74 06 fb 0f 1f 44 00 00 <bf> 01 00 00 00 e8 99 da f1 fe 65 8b 05 f2 99 d7 7d 85 c0 74 02 5d

All code
========
0: 00 00 add %al,(%rax)
2: 31 c0 xor %eax,%eax
4: eb f1 jmp 0xfffffffffffffff7
6: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
d: e8 1b 2e 16 3e call 0x3e162e2d
12: 55 push %rbp
13: 48 89 e5 mov %rsp,%rbp
16: c6 07 00 movb $0x0,(%rdi)
19: 0f 1f 00 nopl (%rax)
1c: f7 c6 00 02 00 00 test $0x200,%esi
22: 74 06 je 0x2a
24: fb sti 
25: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
2a:* bf 01 00 00 00 mov $0x1,%edi <-- trapping instruction
2f: e8 99 da f1 fe call 0xfffffffffef1dacd
34: 65 8b 05 f2 99 d7 7d mov %gs:0x7dd799f2(%rip),%eax # 0x7dd79a2d
3b: 85 c0 test %eax,%eax
3d: 74 02 je 0x41
3f: 5d pop %rbp

Code starting with the faulting instruction
===========================================
0: bf 01 00 00 00 mov $0x1,%edi
5: e8 99 da f1 fe call 0xfffffffffef1daa3
a: 65 8b 05 f2 99 d7 7d mov %gs:0x7dd799f2(%rip),%eax # 0x7dd79a03
11: 85 c0 test %eax,%eax
13: 74 02 je 0x17
15: 5d pop %rbp

[531414.066765] RSP: 0018:ffffc9000c9a77d8 EFLAGS: 00000206
[531414.077943] RIP: 0010:smp_call_function_many_cond (kernel/smp.c:443 kernel/smp.c:988) 
[531416.987351] on_each_cpu_cond_mask (kernel/smp.c:1155) 
[531416.205862] ? text_poke_memset (arch/x86/kernel/alternative.c:1296) 
[531416.681294] ? text_poke_memset (arch/x86/kernel/alternative.c:1296) 
[531417.468443] text_poke_bp_batch (arch/x86/kernel/alternative.c:1553) 
[531418.939923] arch_ftrace_update_trampoline (arch/x86/kernel/ftrace.c:500)
[531419.882055] ? ftrace_no_pid_write (kernel/trace/ftrace.c:7864) 
[531420.510376] ftrace_update_pid_func (kernel/trace/ftrace.c:374 (discriminator 1))
[531420.784703] ftrace_pid_open (kernel/trace/ftrace.c:2918 kernel/trace/ftrace.c:2932 kernel/trace/ftrace.c:7725 kernel/trace/ftrace.c:7835 kernel/trace/ftrace.c:7865) 
[531421.851294] vfs_open (fs/open.c:1017)