[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZRQq2ZqMN34qLs44@alley>
Date: Wed, 27 Sep 2023 15:15:05 +0200
From: Petr Mladek <pmladek@...e.com>
To: John Ogness <john.ogness@...utronix.de>
Cc: Sergey Senozhatsky <senozhatsky@...omium.org>,
Steven Rostedt <rostedt@...dmis.org>,
Thomas Gleixner <tglx@...utronix.de>,
linux-kernel@...r.kernel.org, Kees Cook <keescook@...omium.org>,
Luis Chamberlain <mcgrof@...nel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Peter Zijlstra <peterz@...radead.org>,
Josh Poimboeuf <jpoimboe@...nel.org>,
Arnd Bergmann <arnd@...db.de>,
"Guilherme G. Piccoli" <gpiccoli@...lia.com>,
Andy Shevchenko <andriy.shevchenko@...ux.intel.com>
Subject: Re: [PATCH printk v2 09/11] panic: Add atomic write enforcement to
oops
On Wed 2023-09-20 01:14:54, John Ogness wrote:
> Invoke the atomic write enforcement functions for oops to
> ensure that the information gets out to the consoles.
>
> Since there is no single general function that calls both
> oops_enter() and oops_exit(), the nesting feature of atomic
> write sections is taken advantage of in order to guarantee
> full coverage between the first oops_enter() and the last
> oops_exit().
>
> It is important to note that if there are any legacy consoles
> registered, they will be attempting to directly print from the
> printk-caller context, which may jeopardize the reliability of
> the atomic consoles. Optimally there should be no legacy
> consoles registered.
>
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -630,6 +634,36 @@ bool oops_may_print(void)
> */
> void oops_enter(void)
> {
> + enum nbcon_prio prev_prio;
> + int cpu = -1;
> +
> + /*
> + * If this turns out to be the first CPU in oops, this is the
> + * beginning of the outermost atomic section. Otherwise it is
> + * the beginning of an inner atomic section.
> + */
This sounds strange. What is the advantage of having the inner
atomic context, please? It covers only messages printed inside
oops_enter() and not the whole oops_enter()/exit(). Also see below.
> + prev_prio = nbcon_atomic_enter(NBCON_PRIO_EMERGENCY);
> +
> + if (atomic_try_cmpxchg_relaxed(&oops_cpu, &cpu, smp_processor_id())) {
> + /*
> + * This is the first CPU in oops. Save the outermost
> + * @prev_prio in order to restore it on the outermost
> + * matching oops_exit(), when @oops_nesting == 0.
> + */
> + oops_prev_prio = prev_prio;
> +
> + /*
> + * Enter an inner atomic section that ends at the end of this
> + * function. In this case, the nbcon_atomic_enter() above
> + * began the outermost atomic section.
> + */
> + prev_prio = nbcon_atomic_enter(NBCON_PRIO_EMERGENCY);
> + }
> +
> + /* Track nesting when this CPU is the owner. */
> + if (cpu == -1 || cpu == smp_processor_id())
> + oops_nesting++;
> +
> tracing_off();
> /* can't trust the integrity of the kernel anymore: */
> debug_locks_off();
> @@ -637,6 +671,9 @@ void oops_enter(void)
>
> if (sysctl_oops_all_cpu_backtrace)
> trigger_all_cpu_backtrace();
> +
> + /* Exit inner atomic section. */
> + nbcon_atomic_exit(NBCON_PRIO_EMERGENCY, prev_prio);
This will not flush the messages when:
+ This CPU owns oops_cpu. The flush will have to wait for exiting
the outer loop.
In this case, the inner atomic context is not needed.
+ oops_cpu is owner by another CPU, the other CPU is
just flushing the messages and block the per-console
lock.
The good thing is that the messages printed by this oops_enter()
would likely get flushed by the other CPU.
The bad thing is that oops_exit() on this CPU won't call
nbcon_atomic_exit() so that the following OOPS messages
from this CPU might need to wait for the printk kthread.
IMHO, this is not what we want.
One solution would be to store prev_prio in per-CPU array
so that each CPU could call its own nbcon_atomic_exit().
But I start liking more and more the idea with storing
and counting nested emergency contexts in struct task_struct.
It is the alternative implementation in reply to the 7th patch,
https://lore.kernel.org/r/ZRLBxsXPCym2NC5Q@alley
Then it will be enough to simply call:
+ nbcon_emergency_enter() in oops_enter()
+ nbcon_emergency_exit() in oops_enter()
Best Regards,
Petr
PS: I just hope that you didn't add all this complexity just because
we preferred this behavior at LPC 2022. Especially I hope
that it was not me who proposed and preferred this.
Powered by blists - more mailing lists