[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190717095615.GD3664@jagdpanzerIV>
Date: Wed, 17 Jul 2019 18:56:15 +0900
From: Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
To: Petr Mladek <pmladek@...e.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
Steven Rostedt <rostedt@...dmis.org>,
Peter Zijlstra <peterz@...radead.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Thomas Gleixner <tglx@...utronix.de>,
John Ogness <john.ogness@...utronix.de>,
Petr Tesarik <ptesarik@...e.cz>,
Konstantin Khlebnikov <koct9i@...il.com>, x86@...nel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/2] printk/panic: Access the main printk log in panic()
only when safe
On (07/16/19 09:28), Petr Mladek wrote:
> Kernel tries hard to store and show printk messages when panicking. Even
> logbuf_lock gets re-initialized when only one CPU is running after
> smp_send_stop().
>
> Unfortunately, smp_send_stop() might fail on architectures that do not
> use NMI as a fallback. Then printk log buffer might stay locked and
> a deadlock is almost inevitable.
I'd say that deadlock is still almost inevitable.
panic-CPU syncs with the printing-CPU before it attempts to SMP_STOP.
If there is an active printing-CPU, which is looping in console_unlock(),
taking logbuf_lock in order to msg_print_text() and stuff, then panic-CPU
will spin on console_owner waiting for that printing-CPU to handover
printing duties.
pr_emerg("Kernel panic - not syncing");
smp_send_stop();
If printing-CPU goes nuts under logbuf_lock, has corrupted IDT or anything
else, then we will not progress with panic(). panic-CPU will deadlock. If
not on
pr_emerg("Kernel panic - not syncing")
then on another pr_emerg(), right before the NMI-fallback.
static void native_stop_other_cpus()
{
...
pr_emerg("Shutting down cpus with NMI\n");
^^ deadlock here
apic->send_IPI_allbutself(NMI_VECTOR);
^^ not going to happen
...
}
And it's not only x86. In many cases if we fail to SMP_STOP other
CPUs, and one of hem is holding logbuf_lock then we are done with
panic(). We will not return from smp_send_stop().
arm/kernel/smp.c
void smp_send_stop(void)
{
...
if (num_online_cpus() > 1)
pr_warn("SMP: failed to stop secondary CPUs\n");
}
arm64/kernel/smp.c
void crash_smp_send_stop(void)
{
...
pr_crit("SMP: stopping secondary CPUs\n");
smp_cross_call(&mask, IPI_CPU_CRASH_STOP);
...
if (atomic_read(&waiting_for_crash_ipi) > 0)
pr_warning("SMP: failed to stop secondary CPUs %*pbl\n",
cpumask_pr_args(&mask));
...
}
arm64/kernel/smp.c
void smp_send_stop(void)
{
...
if (num_online_cpus() > 1)
pr_warning("SMP: failed to stop secondary CPUs %*pbl\n",
cpumask_pr_args(cpu_online_mask));
...
}
riscv/kernel/smp.c
void smp_send_stop(void)
{
...
if (num_online_cpus() > 1)
pr_warn("SMP: failed to stop secondary CPUs %*pbl\n",
cpumask_pr_args(cpu_online_mask));
...
}
And so on.
-ss
Powered by blists - more mailing lists