linux-kernel - Re: printk: selective deactivation of feature ignoring non panic cpu's messages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z78eGNIuG_-CVOGl@pathway.suse.cz>
Date: Wed, 26 Feb 2025 14:58:48 +0100
From: Petr Mladek <pmladek@...e.com>
To: John Ogness <john.ogness@...utronix.de>
Cc: Donghyeok Choe <d7271.choe@...sung.com>, linux-kernel@...r.kernel.org,
	takakura@...inux.co.jp, youngmin.nam@...sung.com,
	hajun.sung@...sung.com, seungh.jung@...sung.com,
	jh1012.choi@...sung.com
Subject: Re: printk: selective deactivation of feature ignoring non panic
 cpu's messages

On Wed 2025-02-26 05:31:53, John Ogness wrote:
> Hi Donghyeok,
> 
> On 2025-02-26, Donghyeok Choe <d7271.choe@...sung.com> wrote:
> > I would like to print out the message of non panic cpu as it is.
> > Can I use early_param to selectively disable that feature?
> 
> I have no issues about allowing this type of feature for debugging
> purposes.

Yes. It makes sense. Another scenario might be when
panic_other_cpus_shutdown() is not able to stop some CPUs.
It might be useful to see messages from the problematic ones.

> I do not know if early_param is the best approach. I expect
> Petr will offer good insight here.

early_param() looks good to me. There are already similar early
parameters, for example, "ignore_loglevel".


> > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > index fb242739aec8..3f420e8bdb2c 100644
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -2368,6 +2368,17 @@ void printk_legacy_allow_panic_sync(void)
> >         }
> >  }
> >
> > +static bool __read_mostly keep_printk_all_cpu_in_panic;
> > +
> > +static int __init keep_printk_all_cpu_in_panic_setup(char *str)
> > +{
> > +       keep_printk_all_cpu_in_panic = true;
> > +       pr_info("printk: keep printk all cpu in panic.\n");
> > +
> > +       return 0;
> > +}
> > +early_param("keep_printk_all_cpu_in_panic", keep_printk_all_cpu_in_panic_setup);
> 
> Quite a long argument. I am horrible at naming. I expect Petr would have
> a good suggestion (if early_param is the way to go).

Heh. It seems to be hard to find a good name ;-)

Anyway, I would use "printk_" prefix to make it clear that
it is printk-related. The following comes to my mind:

  + printk_allow_non_panic_cpus
  + printk_keep_non_panic_cpus
  + printk_debug_non_panic_cpus

I prefer "printk_debug_non_panic_cpus", see below.


> >  asmlinkage int vprintk_emit(int facility, int level,
> >                             const struct dev_printk_info *dev_info,
> >                             const char *fmt, va_list args)
> > @@ -2379,13 +2390,15 @@ asmlinkage int vprintk_emit(int facility, int level,
> >         if (unlikely(suppress_printk))
> >                 return 0;
> >
> > -       /*
> > -        * The messages on the panic CPU are the most important. If
> > -        * non-panic CPUs are generating any messages, they will be
> > -        * silently dropped.
> > -        */
> > -       if (other_cpu_in_panic() && !panic_triggering_all_cpu_backtrace)
> > -               return 0;
> > +       if (!keep_printk_all_cpu_in_panic) {
> > +               /*
> > +                * The messages on the panic CPU are the most important. If
> > +                * non-panic CPUs are generating any messages, they will be
> > +                * silently dropped.
> > +                */
> > +               if (other_cpu_in_panic() && !panic_triggering_all_cpu_backtrace)
> > +                       return 0;
> > +       }
> 
> I would not nest it. Just something like:
> 
> 	/*
> 	 * The messages on the panic CPU are the most important. If
> 	 * non-panic CPUs are generating any messages, they may be
> 	 * silently dropped.
> 	 */
> 	if (!keep_printk_all_cpu_in_panic &&
> 	    !panic_triggering_all_cpu_backtrace &&
> 	    other_cpu_in_panic()) {
> 		return 0;
> 	}

I would prefer this form as well.

Thinking loudly:

I wonder if this is actually safe. I recall that we simplified the
design somewhere because we expected that non-panic CPUs will not
add messages. I am not sure that I found all locations. But
we might want to revise:


1st problem: _prb_read_valid() skips non-finalized records on non-panic CPUs.

   opinion: We should not do it in this case.


2nd problem: Is _prb_read_valid() actually safe when
	panic_triggering_all_cpu_backtrace is true?

   opinion: It should be safe because the backtraces from different CPUs
	are serialized via printk_cpu_sync_get_irqsave().


3rd problem: nbcon_get_default_prio() returns NBCON_PRIO_NORMAL on
	non-panic CPUs. As a result, printk_get_console_flush_type()
	would suggest flushing like when the system works as expected.

	But the legacy-loop will bail out after flushing one
	message on one console, see console_flush_all(). It is weird
	behavior.

	Another question is who would flush the messages when the panic()
	CPU does not reach the explicit flush.

   opinion: We should probably try to flush the messages on non-panic
	CPUs in this mode when safe. This is why I prefer the name
	"printk_debug_non_panic_cpus".

	We should update console_flush_all() to do not bail out when
	the new option is set.

	We should call nbcon_atomic_flush_pending() on non-panic CPUs
	when the new option is set. printk_get_console_flush_type()
	should behave like with NBCON_PRIO_EMERGENCY.

	Maybe, nbcon_get_default_prio() should actually return
	NBCON_PRIO_EMERGENCY on non-panic CPUs when this option is set.
	It allow the non-panic CPUs to take over the nbcon context
	from the potentially frozen kthread.


Best Regards,
Petr