linux-kernel - Re: [PATCH 3/3] compiler: inline does not imply notrace

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20221129100647.4957579e@gandalf.local.home>
Date:   Tue, 29 Nov 2022 10:06:47 -0500
From:   Steven Rostedt <rostedt@...dmis.org>
To:     Nadav Amit <namit@...are.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Arnd Bergmann <arnd@...db.de>,
        Thomas Gleixner <tglx@...utronix.de>,
        "linux-arm-kernel@...ts.infradead.org" 
        <linux-arm-kernel@...ts.infradead.org>,
        kernel list <linux-kernel@...r.kernel.org>,
        "linux-ia64@...r.kernel.org" <linux-ia64@...r.kernel.org>,
        "linux-um@...ts.infradead.org" <linux-um@...ts.infradead.org>,
        Linux-Arch <linux-arch@...r.kernel.org>,
        Linux-MM <linux-mm@...ck.org>, Andy Lutomirski <luto@...nel.org>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        X86 ML <x86@...nel.org>, Richard Weinberger <richard@....at>,
        Anton Ivanov <anton.ivanov@...bridgegreys.com>,
        Johannes Berg <johannes@...solutions.net>,
        Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH 3/3] compiler: inline does not imply notrace

On Tue, 29 Nov 2022 04:25:38 +0000
Nadav Amit <namit@...are.com> wrote:


> I will need to further debug it, but this issue does not occur every time.
> 
> The kernel didn’t crash exactly - it’s more of a deadlock. I have lockdep
> enabled, so it is not a deadlock that lockdep knows. Could it be that
> somehow things just slowed down due to IPIs and mostly-disabled IRQs? I have
> no idea. I would need to recreate the scenario. 

You have lockdep enabled and you are running function tracing with stack
trace on? So you are doing a stack trace on *every* function that is traced?

I don't think you hit a deadlock, I think you hit a live lock. You could
possibly slow the system down so much that when an interrupt finishes it's
time for it to be triggered again, and you never make forward progress.

> 
> For the record, I tried to saved some details in the previous email. It was
> kind of hard to understand what’s going on on the other cores, since the
> trace of other cores was interleaved. I extract the parts from that I think
> the refer to the another CPU (yes, the output is really slow, as seen in the
> timestamps):
> 
> [531413.923628] Code: 00 00 31 c0 eb f1 0f 1f 80 00 00 00 00 e8 1b 2e 16 3e 55 48 89 e5 c6 07 00 0f 1f 00 f7 c6 00 02 00 00 74 06 fb 0f 1f 44 00 00 <bf> 01 00 00 00 e8 99 da f1 fe 65 8b 05 f2 99 d7 7d 85 c0 74 02 5d
> 
> All code
> ========
> 0: 00 00 add %al,(%rax)
> 2: 31 c0 xor %eax,%eax
> 4: eb f1 jmp 0xfffffffffffffff7
> 6: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
> d: e8 1b 2e 16 3e call 0x3e162e2d
> 12: 55 push %rbp
> 13: 48 89 e5 mov %rsp,%rbp
> 16: c6 07 00 movb $0x0,(%rdi)
> 19: 0f 1f 00 nopl (%rax)
> 1c: f7 c6 00 02 00 00 test $0x200,%esi
> 22: 74 06 je 0x2a
> 24: fb sti 
> 25: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
> 2a:* bf 01 00 00 00 mov $0x1,%edi <-- trapping instruction
> 2f: e8 99 da f1 fe call 0xfffffffffef1dacd
> 34: 65 8b 05 f2 99 d7 7d mov %gs:0x7dd799f2(%rip),%eax # 0x7dd79a2d
> 3b: 85 c0 test %eax,%eax
> 3d: 74 02 je 0x41
> 3f: 5d pop %rbp
> 
> Code starting with the faulting instruction
> ===========================================
> 0: bf 01 00 00 00 mov $0x1,%edi
> 5: e8 99 da f1 fe call 0xfffffffffef1daa3
> a: 65 8b 05 f2 99 d7 7d mov %gs:0x7dd799f2(%rip),%eax # 0x7dd79a03
> 11: 85 c0 test %eax,%eax
> 13: 74 02 je 0x17
> 15: 5d pop %rbp
> 
> [531414.066765] RSP: 0018:ffffc9000c9a77d8 EFLAGS: 00000206
> [531414.077943] RIP: 0010:smp_call_function_many_cond (kernel/smp.c:443 kernel/smp.c:988) 
> [531416.987351] on_each_cpu_cond_mask (kernel/smp.c:1155) 
> [531416.205862] ? text_poke_memset (arch/x86/kernel/alternative.c:1296) 
> [531416.681294] ? text_poke_memset (arch/x86/kernel/alternative.c:1296) 
> [531417.468443] text_poke_bp_batch (arch/x86/kernel/alternative.c:1553) 
> [531418.939923] arch_ftrace_update_trampoline (arch/x86/kernel/ftrace.c:500)
> [531419.882055] ? ftrace_no_pid_write (kernel/trace/ftrace.c:7864) 
> [531420.510376] ftrace_update_pid_func (kernel/trace/ftrace.c:374 (discriminator 1))
> [531420.784703] ftrace_pid_open (kernel/trace/ftrace.c:2918 kernel/trace/ftrace.c:2932 kernel/trace/ftrace.c:7725 kernel/trace/ftrace.c:7835 kernel/trace/ftrace.c:7865) 
> [531421.851294] vfs_open (fs/open.c:1017)
> 
> 
> 

Do you have an issue with normal function tracing, and not tracing every
function.

I should also add this, because it detects recursion faster than the
atomic_inc_return() does.

-- Steve

diff --git a/kernel/trace/trace_functions.c b/kernel/trace/trace_functions.c
index 9f1bfbe105e8..93ec756dc24b 100644
--- a/kernel/trace/trace_functions.c
+++ b/kernel/trace/trace_functions.c
@@ -221,12 +221,18 @@ function_stack_trace_call(unsigned long ip, unsigned long parent_ip,
 	struct trace_array_cpu *data;
 	unsigned long flags;
 	long disabled;
+	int bit;
 	int cpu;
 	unsigned int trace_ctx;
 
 	if (unlikely(!tr->function_enabled))
 		return;
 
+	/* Faster than atomic_inc_return() */
+	bit = ftrace_test_recursion_trylock(ip, parent_ip);
+	if (bit < 0)
+		return;
+
 	/*
 	 * Need to use raw, since this must be called before the
 	 * recursive protection is performed.
@@ -244,6 +250,7 @@ function_stack_trace_call(unsigned long ip, unsigned long parent_ip,
 
 	atomic_dec(&data->disabled);
 	local_irq_restore(flags);
+	ftrace_test_recursion_unlock(bit);
 }
 
 static inline bool is_repeat_check(struct trace_array *tr,