[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d18ff73a0ef7536f654b63854dc891984319093f.camel@surriel.com>
Date: Thu, 18 Jul 2024 09:38:18 -0400
From: Rik van Riel <riel@...riel.com>
To: John Ogness <john.ogness@...utronix.de>, Andrew Morton
<akpm@...ux-foundation.org>
Cc: Omar Sandoval <osandov@...a.com>, linux-kernel@...r.kernel.org, Petr
Mladek <pmladek@...e.com>, Steven Rostedt <rostedt@...dmis.org>, Sergey
Senozhatsky <senozhatsky@...omium.org>, kernel-team <kernel-team@...a.com>
Subject: Re: [RFC PATCH] nmi,printk: fix ABBA deadlock between nmi_backtrace
and dump_stack_lvl
On Thu, 2024-07-18 at 09:31 +0206, John Ogness wrote:
> On 2024-07-17, Rik van Riel <riel@...riel.com> wrote:
> > I think that would do the trick. The nmi_backtrace() printk is
> > already
> > deferred, because of the check for in_nmi() in vprintk(), and this
> > change would put all the other users of
> > printk_cpu_sync_get_irqsave()
> > on the exact same footing as nmi_backtrace().
> >
> > Combing through the code a little, it looks like that would remove
> > the potential for this deadlock to happen again.
>
> Let's see what Petr has to say. (He'll be back on Monday.) He might
> prefer a solution that does not result in deferring printing for all
> cases. i.e. allow the console_lock if it is available, but avoid the
> spinning if it is not. Below is a patch that would achieve this.
>
> John
>
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index dddb15f48d59..36f40db0bf93 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -1060,6 +1060,8 @@ static int __init log_buf_len_setup(char *str)
> early_param("log_buf_len", log_buf_len_setup);
>
> #ifdef CONFIG_SMP
> +static bool vprintk_emit_may_spin(void);
> +
> #define __LOG_CPU_MAX_BUF_LEN (1 << CONFIG_LOG_CPU_MAX_BUF_SHIFT)
>
> static void __init log_buf_add_cpu(void)
> @@ -1090,6 +1092,7 @@ static void __init log_buf_add_cpu(void)
> }
> #else /* !CONFIG_SMP */
> static inline void log_buf_add_cpu(void) {}
> +static inline bool vprintk_emit_may_spin(void) { return true };
> #endif /* CONFIG_SMP */
>
> static void __init set_percpu_data_ready(void)
> @@ -2330,6 +2333,8 @@ asmlinkage int vprintk_emit(int facility, int
> level,
>
> /* If called from the scheduler, we can not call up(). */
> if (!in_sched) {
> + int ret;
> +
> /*
> * The caller may be holding system-critical or
> * timing-sensitive locks. Disable preemption during
> @@ -2344,7 +2349,11 @@ asmlinkage int vprintk_emit(int facility, int
> level,
> * spinning variant, this context tries to take over
> the
> * printing from another printing context.
> */
> - if (console_trylock_spinning())
> + if (vprintk_emit_may_spin())
> + ret = console_trylock_spinning();
> + else
> + ret = console_trylock();
> + if (ret)
> console_unlock();
> preempt_enable();
> }
> @@ -4321,6 +4330,15 @@ void console_replay_all(void)
> static atomic_t printk_cpu_sync_owner = ATOMIC_INIT(-1);
> static atomic_t printk_cpu_sync_nested = ATOMIC_INIT(0);
>
> +/*
> + * As documented in printk_cpu_sync_get_irqsave(), a context holding
> the
> + * printk_cpu_sync must not spin waiting for another CPU.
> + */
> +static bool vprintk_emit_may_spin(void)
> +{
> + return (atomic_read(&printk_cpu_sync_owner) !=
> smp_processor_id());
> +}
I think the above would still deadlock, because the reported
deadlock is an ABBA deadlock between two different CPUs.
I think what the code would have to do is only trylock, and never
spin after taking the printk_cpu_sync_get_irqsave lock.
Were you thinking of moving the this_cpu_read(printk_context)
check from vprintk() into vprintk_emit() and use that to decide
whether to spin for the lock, or to give up if the trylock fails?
--
All Rights Reversed.
Powered by blists - more mailing lists