[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190731060839.GA1756@jagdpanzerIV>
Date: Wed, 31 Jul 2019 15:08:39 +0900
From: Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
To: Petr Mladek <pmladek@...e.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
Konstantin Khlebnikov <koct9i@...il.com>,
Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
Steven Rostedt <rostedt@...dmis.org>,
Peter Zijlstra <peterz@...radead.org>, x86@...nel.org,
John Ogness <john.ogness@...utronix.de>,
Thomas Gleixner <tglx@...utronix.de>,
Andrew Morton <akpm@...ux-foundation.org>,
Petr Tesarik <ptesarik@...e.cz>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/2] printk/panic: Access the main printk log in panic()
only when safe
Sorry for the delayed response.
On (07/24/19 14:27), Petr Mladek wrote:
[..]
> > And this is where the idea of "disconnecting" those CPUs from main
> > logbuf come from.
> >
> > So what we can do:
> > - smp_send_stop()
> > - disconnect all-but-self from logbuf (via printk-safe)
>
> printk_safe is not really necessary. As you wrote, nobody
> is interested into messages from CPUs that are supposed
> to be stopped.
OK.
Doing it in printk.c makes it easier to handle console_owner/waiter,
which are not exported.
> It might be enough to set some global variable, for example,
> with the CPU number that is calling panic() and is the only
> one allowed to print messages from this point on.
Sounds good.
> It might even be used to force console lock owner to leave
> the cycle immediately.
Yes, makes sense.
[..]
> > So, shall we try one more time with the "disconnect" misbehaving CPUs
> > approach? I can send an RFC patch.
>
> IMHO, it will be acceptable only when it is reasonably simple and
> straightforward. The panic() code is full of special hacks and
> it is already complicated to follow all the twists.
When you have a chance, mind to take a look at the patch below?
Doesn't look very difficult (half of it are white-spaces and
comments, I believe).
> Especially because this scenario came up from a theoretical
> discussion. Note that my original, real bug report, was
> with logbuf_lock NMI, enabled kdump notifiers, ...
I understand. The idea is that this patch should handle your real
scenario + theoretical scenario.
---
include/linux/printk.h | 2 ++
kernel/panic.c | 9 ++++++-
kernel/printk/printk.c | 53 ++++++++++++++++++++++++++++++++++++++++++
3 files changed, 63 insertions(+), 1 deletion(-)
diff --git a/include/linux/printk.h b/include/linux/printk.h
index 57c9473f4a81..016f5ba06e94 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -322,6 +322,8 @@ static inline void printk_safe_flush_on_panic(void)
}
#endif
+void printk_enter_panic_mode(int cpu);
+
extern int kptr_restrict;
#ifndef pr_fmt
diff --git a/kernel/panic.c b/kernel/panic.c
index d1ece4c363b9..85fac975a90f 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -254,13 +254,20 @@ void panic(const char *fmt, ...)
crash_smp_send_stop();
}
+ /* Misbehaving secondary CPUs cannot printk() to the main logbuf now */
+ printk_enter_panic_mode(this_cpu);
+
/*
* Run any panic handlers, including those that might need to
* add information to the kmsg dump output.
*/
atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
- /* Call flush even twice. It tries harder with a single online CPU */
+ /*
+ * Call flush even twice. It tries harder with a single online CPU.
+ * Even if we failed to stop some of secondary CPUs we have printk
+ * locks re-initialized and keep secondary CPUs off printk().
+ */
printk_safe_flush_on_panic();
kmsg_dump(KMSG_DUMP_PANIC);
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index f0bc37a511a7..750f83c3b589 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -87,6 +87,8 @@ static DEFINE_SEMAPHORE(console_sem);
struct console *console_drivers;
EXPORT_SYMBOL_GPL(console_drivers);
+static atomic_t __read_mostly printk_panic_cpu = ATOMIC_INIT(PANIC_CPU_INVALID);
+
/*
* System may need to suppress printk message under certain
* circumstances, like after kernel panic happens.
@@ -1644,6 +1646,49 @@ static void console_lock_spinning_enable(void)
spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_);
}
+static int panic_in_progress_on_other_cpu(void)
+{
+ int cpu = atomic_read(&printk_panic_cpu);
+
+ return cpu != PANIC_CPU_INVALID && cpu != smp_processor_id();
+}
+
+void printk_enter_panic_mode(int cpu)
+{
+ unsigned long timeout;
+
+ cpu = atomic_cmpxchg(&printk_panic_cpu, PANIC_CPU_INVALID, cpu);
+ /* printk can enter panic mode only once */
+ if (cpu != PANIC_CPU_INVALID)
+ return;
+ /*
+ * Wait for active secondary CPUs (if there are any) to leave
+ * console_unlock() printing loop (for up to one second).
+ */
+ if (num_online_cpus() > 1) {
+ timeout = USEC_PER_SEC;
+ while (num_online_cpus() > 1 && timeout--)
+ udelay(1);
+ }
+
+ debug_locks_off();
+ /*
+ * On some platforms crash_smp_send_stop() can kill CPUs via NMI
+ * vector. Re-init printk() locks just in case if any of those killed
+ * CPUs held any of printk() locks. On platforms which don't support
+ * NMI stop, misbehaving secondary CPUs will be handled by
+ * panic_in_progress_on_other_cpu() test.
+ *
+ * We re-init only printk() locks here. oops_in_progress is expected
+ * to be set by now, so good console drivers are in lockless mode,
+ * bad console drivers, however, can deadlock.
+ */
+ raw_spin_lock_init(&logbuf_lock);
+ sema_init(&console_sem, 1);
+ WRITE_ONCE(console_waiter, false);
+ raw_spin_lock_init(&console_owner_lock);
+}
+
/**
* console_lock_spinning_disable_and_check - mark end of code where another
* thread was able to busy wait and check if there is a waiter
@@ -1900,6 +1945,9 @@ int vprintk_store(int facility, int level,
size_t text_len;
enum log_flags lflags = 0;
+ if (panic_in_progress_on_other_cpu())
+ return 0;
+
/*
* The printf needs to come first; we need the syslog
* prefix which might be passed-in as a parameter.
@@ -2468,6 +2516,11 @@ void console_unlock(void)
return;
}
+ if (panic_in_progress_on_other_cpu()) {
+ printk_safe_exit_irqrestore(flags);
+ return;
+ }
+
printk_safe_exit_irqrestore(flags);
if (do_cond_resched)
--
2.22.0
Powered by blists - more mailing lists