lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aCHABVT4g75G3l1k@U-2FWC9VHC-2323.local>
Date: Mon, 12 May 2025 17:31:49 +0800
From: Feng Tang <feng.tang@...ux.alibaba.com>
To: Lance Yang <lance.yang@...ux.dev>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
	Petr Mladek <pmladek@...e.com>,
	Steven Rostedt <rostedt@...dmis.org>, linux-kernel@...r.kernel.org,
	mhiramat@...nel.org, llong@...hat.com
Subject: Re: [PATCH v1 0/3] generalize panic_print's dump function to be used
 by other kernel parts

On Mon, May 12, 2025 at 04:23:30PM +0800, Lance Yang wrote:
> 
> 
> On 2025/5/12 11:14, Feng Tang wrote:
> > Hi Andrew,
> > 
> > Thanks for the review!
> > 
> > On Sun, May 11, 2025 at 06:46:17PM -0700, Andrew Morton wrote:
> > > On Sun, 11 May 2025 16:52:51 +0800 Feng Tang <feng.tang@...ux.alibaba.com> wrote:
> > > 
> > > > When working on kernel stability issues, panic, task-hung and
> > > > software/hardware lockup are frequently met. And to debug them, user
> > > > may need lots of system information at that time, like task call stacks,
> > > > lock info, memory info etc.
> > > > 
> > > > panic case already has panic_print_sys_info() for this purpose, and has
> > > > a 'panic_print' bitmask to control what kinds of information is needed,
> > > > which is also helpful to debug other task-hung and lockup cases.
> > > > 
> > > > So this patchset extract the function out, and make it usable for other
> > > > cases which also need system info for debugging.
> > > > 
> > > > Locally these have been used in our bug chasing for stablility issues
> > > > and was helpful.
> > > 
> > > Truth.  Our responses to panics, oopses, WARNs, BUGs, OOMs etc seem
> > > quite poorly organized.  Some effort to clean up (and document!) all of
> > > this sounds good.
> > > 
> > > My vote is to permit the display of every scrap of information we can
> > > think of in all situations.  And then to permit users to select which of
> > > that information is to be displayed under each situation.
> 
> Completely agreed. The tricky part is making a global knob that works for
> all situations without breaking userspace, but it's a better system-wide
> approach ;)
> 
> > 
> > Good point! Maybe one future todo is to add a gloabl system info dump
> > function with ONE global knob for selecting different kinds of information,
> > which could be embedded into some cases you mentioned above.
> 
> IMHO, for features with their own knobs, we need:
> a) The global knob (if enabled) turns on all related feature-level knobs,
> b) while still allowing users to manually override individual knobs.
> 
> Something like:
> 
> If SYS_PRINT_ALL_CPU_BT (global knob) is on, it enables
> hung_task_all_cpu_backtrace
> for hung-task situation automatically. But users can still disable it via
> hung_task_all_cpu_backtrace.
> 
> Anyway, the global knob (when set) controls all feature-level knobs, but
> they can override it if explicitly set ;)

Yes, it makes sense for parts which already has its own user space
control knob.

What I proposed is a todo mostly for other parts than panic/hungtask
in this patchset, as these parts have some special handling required,
like panic need to handle printk-replay for kexec case. 

Thanks,
Feng

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ