[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1a5bc420-c716-4d0b-b767-32adf32f4958@linux.dev>
Date: Mon, 12 May 2025 16:23:30 +0800
From: Lance Yang <lance.yang@...ux.dev>
To: Feng Tang <feng.tang@...ux.alibaba.com>,
Andrew Morton <akpm@...ux-foundation.org>
Cc: Petr Mladek <pmladek@...e.com>, Steven Rostedt <rostedt@...dmis.org>,
linux-kernel@...r.kernel.org, mhiramat@...nel.org, llong@...hat.com
Subject: Re: [PATCH v1 0/3] generalize panic_print's dump function to be used
by other kernel parts
On 2025/5/12 11:14, Feng Tang wrote:
> Hi Andrew,
>
> Thanks for the review!
>
> On Sun, May 11, 2025 at 06:46:17PM -0700, Andrew Morton wrote:
>> On Sun, 11 May 2025 16:52:51 +0800 Feng Tang <feng.tang@...ux.alibaba.com> wrote:
>>
>>> When working on kernel stability issues, panic, task-hung and
>>> software/hardware lockup are frequently met. And to debug them, user
>>> may need lots of system information at that time, like task call stacks,
>>> lock info, memory info etc.
>>>
>>> panic case already has panic_print_sys_info() for this purpose, and has
>>> a 'panic_print' bitmask to control what kinds of information is needed,
>>> which is also helpful to debug other task-hung and lockup cases.
>>>
>>> So this patchset extract the function out, and make it usable for other
>>> cases which also need system info for debugging.
>>>
>>> Locally these have been used in our bug chasing for stablility issues
>>> and was helpful.
>>
>> Truth. Our responses to panics, oopses, WARNs, BUGs, OOMs etc seem
>> quite poorly organized. Some effort to clean up (and document!) all of
>> this sounds good.
>>
>> My vote is to permit the display of every scrap of information we can
>> think of in all situations. And then to permit users to select which of
>> that information is to be displayed under each situation.
Completely agreed. The tricky part is making a global knob that works for
all situations without breaking userspace, but it's a better system-wide
approach ;)
>
> Good point! Maybe one future todo is to add a gloabl system info dump
> function with ONE global knob for selecting different kinds of information,
> which could be embedded into some cases you mentioned above.
IMHO, for features with their own knobs, we need:
a) The global knob (if enabled) turns on all related feature-level knobs,
b) while still allowing users to manually override individual knobs.
Something like:
If SYS_PRINT_ALL_CPU_BT (global knob) is on, it enables
hung_task_all_cpu_backtrace
for hung-task situation automatically. But users can still disable it via
hung_task_all_cpu_backtrace.
Anyway, the global knob (when set) controls all feature-level knobs, but
they can override it if explicitly set ;)
Thanks,
Lance
>
>> As for this patchset - sounds good to me. For now I'll await input
>> from reviewers.
>
> Thank you!
>
> - Feng
Powered by blists - more mailing lists