[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Z78eGNIuG_-CVOGl@pathway.suse.cz>
Date: Wed, 26 Feb 2025 14:58:48 +0100
From: Petr Mladek <pmladek@...e.com>
To: John Ogness <john.ogness@...utronix.de>
Cc: Donghyeok Choe <d7271.choe@...sung.com>, linux-kernel@...r.kernel.org,
takakura@...inux.co.jp, youngmin.nam@...sung.com,
hajun.sung@...sung.com, seungh.jung@...sung.com,
jh1012.choi@...sung.com
Subject: Re: printk: selective deactivation of feature ignoring non panic
cpu's messages
On Wed 2025-02-26 05:31:53, John Ogness wrote:
> Hi Donghyeok,
>
> On 2025-02-26, Donghyeok Choe <d7271.choe@...sung.com> wrote:
> > I would like to print out the message of non panic cpu as it is.
> > Can I use early_param to selectively disable that feature?
>
> I have no issues about allowing this type of feature for debugging
> purposes.
Yes. It makes sense. Another scenario might be when
panic_other_cpus_shutdown() is not able to stop some CPUs.
It might be useful to see messages from the problematic ones.
> I do not know if early_param is the best approach. I expect
> Petr will offer good insight here.
early_param() looks good to me. There are already similar early
parameters, for example, "ignore_loglevel".
> > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > index fb242739aec8..3f420e8bdb2c 100644
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -2368,6 +2368,17 @@ void printk_legacy_allow_panic_sync(void)
> > }
> > }
> >
> > +static bool __read_mostly keep_printk_all_cpu_in_panic;
> > +
> > +static int __init keep_printk_all_cpu_in_panic_setup(char *str)
> > +{
> > + keep_printk_all_cpu_in_panic = true;
> > + pr_info("printk: keep printk all cpu in panic.\n");
> > +
> > + return 0;
> > +}
> > +early_param("keep_printk_all_cpu_in_panic", keep_printk_all_cpu_in_panic_setup);
>
> Quite a long argument. I am horrible at naming. I expect Petr would have
> a good suggestion (if early_param is the way to go).
Heh. It seems to be hard to find a good name ;-)
Anyway, I would use "printk_" prefix to make it clear that
it is printk-related. The following comes to my mind:
+ printk_allow_non_panic_cpus
+ printk_keep_non_panic_cpus
+ printk_debug_non_panic_cpus
I prefer "printk_debug_non_panic_cpus", see below.
> > asmlinkage int vprintk_emit(int facility, int level,
> > const struct dev_printk_info *dev_info,
> > const char *fmt, va_list args)
> > @@ -2379,13 +2390,15 @@ asmlinkage int vprintk_emit(int facility, int level,
> > if (unlikely(suppress_printk))
> > return 0;
> >
> > - /*
> > - * The messages on the panic CPU are the most important. If
> > - * non-panic CPUs are generating any messages, they will be
> > - * silently dropped.
> > - */
> > - if (other_cpu_in_panic() && !panic_triggering_all_cpu_backtrace)
> > - return 0;
> > + if (!keep_printk_all_cpu_in_panic) {
> > + /*
> > + * The messages on the panic CPU are the most important. If
> > + * non-panic CPUs are generating any messages, they will be
> > + * silently dropped.
> > + */
> > + if (other_cpu_in_panic() && !panic_triggering_all_cpu_backtrace)
> > + return 0;
> > + }
>
> I would not nest it. Just something like:
>
> /*
> * The messages on the panic CPU are the most important. If
> * non-panic CPUs are generating any messages, they may be
> * silently dropped.
> */
> if (!keep_printk_all_cpu_in_panic &&
> !panic_triggering_all_cpu_backtrace &&
> other_cpu_in_panic()) {
> return 0;
> }
I would prefer this form as well.
Thinking loudly:
I wonder if this is actually safe. I recall that we simplified the
design somewhere because we expected that non-panic CPUs will not
add messages. I am not sure that I found all locations. But
we might want to revise:
1st problem: _prb_read_valid() skips non-finalized records on non-panic CPUs.
opinion: We should not do it in this case.
2nd problem: Is _prb_read_valid() actually safe when
panic_triggering_all_cpu_backtrace is true?
opinion: It should be safe because the backtraces from different CPUs
are serialized via printk_cpu_sync_get_irqsave().
3rd problem: nbcon_get_default_prio() returns NBCON_PRIO_NORMAL on
non-panic CPUs. As a result, printk_get_console_flush_type()
would suggest flushing like when the system works as expected.
But the legacy-loop will bail out after flushing one
message on one console, see console_flush_all(). It is weird
behavior.
Another question is who would flush the messages when the panic()
CPU does not reach the explicit flush.
opinion: We should probably try to flush the messages on non-panic
CPUs in this mode when safe. This is why I prefer the name
"printk_debug_non_panic_cpus".
We should update console_flush_all() to do not bail out when
the new option is set.
We should call nbcon_atomic_flush_pending() on non-panic CPUs
when the new option is set. printk_get_console_flush_type()
should behave like with NBCON_PRIO_EMERGENCY.
Maybe, nbcon_get_default_prio() should actually return
NBCON_PRIO_EMERGENCY on non-panic CPUs when this option is set.
It allow the non-panic CPUs to take over the nbcon context
from the potentially frozen kthread.
Best Regards,
Petr
Powered by blists - more mailing lists