linux-kernel - Re: a question about how to debug this case in ftrace

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4d198032-d4e1-4a84-8f56-1b31157e9323@189.cn>
Date: Wed, 26 Jun 2024 10:06:33 +0800
From: Song Chen <chensong_2000@....cn>
To: Andrew Halaney <ahalaney@...hat.com>
Cc: Derek Barbosa <debarbos@...hat.com>, pmladek@...e.com,
 rostedt@...dmis.org, john.ogness@...utronix.de, senozhatsky@...omium.org,
 linux-rt-users@...r.kernel.org, linux-kernel@...r.kernel.org,
 williams@...hat.com, jlelli@...hat.com
Subject: Re: a question about how to debug this case in ftrace

Hi Andrew,

If I understood it correctly, it's similar with rcu 
stall(rcu_cpu_stall_ftrace_dump).

So far I don't need too much detail in vmcore, so only merging dmesg and 
ftrace is more practical to my case. I will give it a try, many thanks.

BR

Song

在 2024/6/25 21:20, Andrew Halaney 写道:
> Hey Song,
> 
> I'll answer that bit as those were my hacks :P
> 
> I _thought_ that doing a ftrace_dump_on_oops + softlockup_panic on the
> command line (or similar at runtime) would dump the logs, but that
> wasn't working for me and I was in a bit of a rush (and already hacking
> the kernel up anyways).
> 
> So I ended up doing a sequence like:
> 
>      tracing_off();
>      ftrace_dump(DUMP_ALL);
> 
> in the softlockup code when it was detected. Ideally I wanted to look at
> the vmcore and look at the ftrace data in there (since debugging printk
> by using printk is a little confusing), but there was a makedumpfile bug
> I hit... so I went with the hacky route to prove to myself what was
> going on. I think since then that's been resolved. Hope that helps!
> 
> Thanks,
> Andrew
> 
> On Tue, Jun 25, 2024 at 09:36:54AM GMT, Song Chen wrote:
>> Hi Derek,
>>
>> I am working on a hungtask case, i saw your trace_printk logs go to the same
>> output with softlockup warning, It's a smart way to easily build a timeline
>> for analysis.
>>
>> As far as i know, trace_printk goes to ftrace ring buffer and softlockup
>> info goes to dmesg. Could you please let me know how you did that, i marked
>> the part i'm interested in below, thanks a lot.
>>
>> BR
>>
>> Song
>>
>>
>> 在 2024/6/18 23:20, Derek Barbosa 写道:
>>> lgoncalv@...hat.com, jwyatt@...hat.com, aubaker@...hat.com
>>> Bcc:
>>> Subject: watchdog BUG: soft lockup - CPU#x stuck for 78s
>>> Reply-To:
>>>
>>> Hi,
>>>
>>> The realtime team at Red Hat has recently backported the latest printk changes
>>> present in 6.6-rt stable (HEAD at 20fd4439f644 printk: nbcon: move locked_port flag to
>>> struct uart_port) to CentOS Stream 9 for performance improvements and
>>> printk-related bugfixes.
>>>
>>> Since merging this said code, we've hit an interesting bug during testing,
>>> specifically, on larger systems, a softlockup may be reported by the watchdog
>>> when there is a heavy amount of printing to tty devices (whether it be through
>>> printk, /dev/kmsg, etc).
>>>
>>> We have a modicum of reasonable suspicion to believe that nbcon_reacquire, or
>>> some other nbcon mechanism *may* be causing such behavior.
>>>
>>> Since we've succesfully reproduced this in the Kernel-ARK/Fedora-ELN
>>> (osbuild-rt), and linux-rt-devel 6.10.rc4-rt6, we are reporting this bug
>>> upstream.
>>>
>>> Anyway, here is a more in-depth description, along with some call traces.
>>>
>>> Description:
>>>
>>> On x86 systems with a large amount of logical cores (nproc ~> 60), a softlockup can
>>> be observed with accompanying call trace when a large amount of "printing"
>>> activity is taking place.
>>>
>>> As shown in the call traces appended below, during some kind of numa
>>> balancing/numa_migration after a task_numa_fault --where a set of processess are being migrated/swapped
>>> between two CPUs-- there is a busy thread that is being waited on (in the order
>>> of seconds), causing a livelock. Additional investigation of collected vmcores
>>> by toggling panic on softlockup shows that the waiting thread may be waiting for
>>> a thread looping with nbcon_reacquire.
>>>
>>> I suspect that some logic within nbcon_context_try_acquire may be a good place
>>> to start. My understanding of the code becomes a bit fuzzy here, so apologies
>>> in advance for any erroneous statements. As I see it, there may be something
>>> happening during migration (or under heavy load) in which nbcon_reacquire() is in
>>> a non-migratable or non-preemtible state as it is attempting to regain access to
>>> a lost/taken console. It could very well be a situation in which context
>>> was forcefully taken from the printing thread.
>>>
>>> Alternatively, Andrew Halaney <ahalaney@...hat.com> suspects that it is the loop within
>>> nbcon_kthread_func() -- since there is nothing that would yield the task in said
>>> loop (cant_migrate()), the migrate code would be essentially waiting forever for
>>> the aforementioned loop to "finish". I believe in PREEMPT_RT, there would be a
>>> preemption point here. Furthermore, in his investigation, there were signs that the
>>> loop was just iterating up until the crash, leaving reason to
>>> believe that task would be the culprit.
>>>
>>> In fact, with the following diff, we noticed this output:
>>>
>>> ```
>>> ahalaney@...en2nano ~/git/linux-rt-devel (git)-[tags/v6.10-rc4-rt6-rebase] % git diff | cat
>>> diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
>>> index bb9689f94d30..d716b72bf2f8 100644
>>> --- a/kernel/printk/nbcon.c
>>> +++ b/kernel/printk/nbcon.c
>>> @@ -1075,6 +1075,7 @@ static int nbcon_kthread_func(void *__console)
>>>         *
>>>         * This pairs with rcuwait_has_sleeper:A and nbcon_kthread_wake:A
>>
> 
>