linux-kernel - Re: a question about how to debug this case in ftrace

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e6b0ba9e-e8f6-4c42-9f63-e3b67d7edbf0@189.cn>
Date: Wed, 26 Jun 2024 10:12:25 +0800
From: Song Chen <chensong_2000@....cn>
To: Andrew Halaney <ahalaney@...hat.com>
Cc: Derek Barbosa <debarbos@...hat.com>, pmladek@...e.com,
 rostedt@...dmis.org, john.ogness@...utronix.de, senozhatsky@...omium.org,
 linux-rt-users@...r.kernel.org, linux-kernel@...r.kernel.org,
 williams@...hat.com, jlelli@...hat.com
Subject: Re: a question about how to debug this case in ftrace

ohh, by the way, i'm curious why softlockup/hardlockup/hungtask don't 
have ftrace dump as well as rcu stall, or is it possible to add a 
parameter in cmdline to redirect softlockup/hardlockup/hungtask/rcustall 
warning either to ftrace or printk.

Just a thought.

BR

Song

在 2024/6/26 10:06, Song Chen 写道:
> Hi Andrew,
> 
> If I understood it correctly, it's similar with rcu 
> stall(rcu_cpu_stall_ftrace_dump).
> 
> So far I don't need too much detail in vmcore, so only merging dmesg and 
> ftrace is more practical to my case. I will give it a try, many thanks.
> 
> BR
> 
> Song
> 
> 在 2024/6/25 21:20, Andrew Halaney 写道:
>> Hey Song,
>>
>> I'll answer that bit as those were my hacks :P
>>
>> I _thought_ that doing a ftrace_dump_on_oops + softlockup_panic on the
>> command line (or similar at runtime) would dump the logs, but that
>> wasn't working for me and I was in a bit of a rush (and already hacking
>> the kernel up anyways).
>>
>> So I ended up doing a sequence like:
>>
>>      tracing_off();
>>      ftrace_dump(DUMP_ALL);
>>
>> in the softlockup code when it was detected. Ideally I wanted to look at
>> the vmcore and look at the ftrace data in there (since debugging printk
>> by using printk is a little confusing), but there was a makedumpfile bug
>> I hit... so I went with the hacky route to prove to myself what was
>> going on. I think since then that's been resolved. Hope that helps!
>>
>> Thanks,
>> Andrew
>>
>> On Tue, Jun 25, 2024 at 09:36:54AM GMT, Song Chen wrote:
>>> Hi Derek,
>>>
>>> I am working on a hungtask case, i saw your trace_printk logs go to 
>>> the same
>>> output with softlockup warning, It's a smart way to easily build a 
>>> timeline
>>> for analysis.
>>>
>>> As far as i know, trace_printk goes to ftrace ring buffer and softlockup
>>> info goes to dmesg. Could you please let me know how you did that, i 
>>> marked
>>> the part i'm interested in below, thanks a lot.
>>>
>>> BR
>>>
>>> Song
>>>
>>>
>>> 在 2024/6/18 23:20, Derek Barbosa 写道:
>>>> lgoncalv@...hat.com, jwyatt@...hat.com, aubaker@...hat.com
>>>> Bcc:
>>>> Subject: watchdog BUG: soft lockup - CPU#x stuck for 78s
>>>> Reply-To:
>>>>
>>>> Hi,
>>>>
>>>> The realtime team at Red Hat has recently backported the latest 
>>>> printk changes
>>>> present in 6.6-rt stable (HEAD at 20fd4439f644 printk: nbcon: move 
>>>> locked_port flag to
>>>> struct uart_port) to CentOS Stream 9 for performance improvements and
>>>> printk-related bugfixes.
>>>>
>>>> Since merging this said code, we've hit an interesting bug during 
>>>> testing,
>>>> specifically, on larger systems, a softlockup may be reported by the 
>>>> watchdog
>>>> when there is a heavy amount of printing to tty devices (whether it 
>>>> be through
>>>> printk, /dev/kmsg, etc).
>>>>
>>>> We have a modicum of reasonable suspicion to believe that 
>>>> nbcon_reacquire, or
>>>> some other nbcon mechanism *may* be causing such behavior.
>>>>
>>>> Since we've succesfully reproduced this in the Kernel-ARK/Fedora-ELN
>>>> (osbuild-rt), and linux-rt-devel 6.10.rc4-rt6, we are reporting this 
>>>> bug
>>>> upstream.
>>>>
>>>> Anyway, here is a more in-depth description, along with some call 
>>>> traces.
>>>>
>>>> Description:
>>>>
>>>> On x86 systems with a large amount of logical cores (nproc ~> 60), a 
>>>> softlockup can
>>>> be observed with accompanying call trace when a large amount of 
>>>> "printing"
>>>> activity is taking place.
>>>>
>>>> As shown in the call traces appended below, during some kind of numa
>>>> balancing/numa_migration after a task_numa_fault --where a set of 
>>>> processess are being migrated/swapped
>>>> between two CPUs-- there is a busy thread that is being waited on 
>>>> (in the order
>>>> of seconds), causing a livelock. Additional investigation of 
>>>> collected vmcores
>>>> by toggling panic on softlockup shows that the waiting thread may be 
>>>> waiting for
>>>> a thread looping with nbcon_reacquire.
>>>>
>>>> I suspect that some logic within nbcon_context_try_acquire may be a 
>>>> good place
>>>> to start. My understanding of the code becomes a bit fuzzy here, so 
>>>> apologies
>>>> in advance for any erroneous statements. As I see it, there may be 
>>>> something
>>>> happening during migration (or under heavy load) in which 
>>>> nbcon_reacquire() is in
>>>> a non-migratable or non-preemtible state as it is attempting to 
>>>> regain access to
>>>> a lost/taken console. It could very well be a situation in which 
>>>> context
>>>> was forcefully taken from the printing thread.
>>>>
>>>> Alternatively, Andrew Halaney <ahalaney@...hat.com> suspects that it 
>>>> is the loop within
>>>> nbcon_kthread_func() -- since there is nothing that would yield the 
>>>> task in said
>>>> loop (cant_migrate()), the migrate code would be essentially waiting 
>>>> forever for
>>>> the aforementioned loop to "finish". I believe in PREEMPT_RT, there 
>>>> would be a
>>>> preemption point here. Furthermore, in his investigation, there were 
>>>> signs that the
>>>> loop was just iterating up until the crash, leaving reason to
>>>> believe that task would be the culprit.
>>>>
>>>> In fact, with the following diff, we noticed this output:
>>>>
>>>> ```
>>>> ahalaney@...en2nano ~/git/linux-rt-devel 
>>>> (git)-[tags/v6.10-rc4-rt6-rebase] % git diff | cat
>>>> diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
>>>> index bb9689f94d30..d716b72bf2f8 100644
>>>> --- a/kernel/printk/nbcon.c
>>>> +++ b/kernel/printk/nbcon.c
>>>> @@ -1075,6 +1075,7 @@ static int nbcon_kthread_func(void *__console)
>>>>         *
>>>>         * This pairs with rcuwait_has_sleeper:A and 
>>>> nbcon_kthread_wake:A
>>>
>>
>>
> 
>