[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190718112954.GA1774@jagdpanzerIV>
Date: Thu, 18 Jul 2019 20:29:54 +0900
From: Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
To: Konstantin Khlebnikov <koct9i@...il.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
Petr Mladek <pmladek@...e.com>,
Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
Steven Rostedt <rostedt@...dmis.org>,
Peter Zijlstra <peterz@...radead.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Thomas Gleixner <tglx@...utronix.de>,
John Ogness <john.ogness@...utronix.de>,
Petr Tesarik <ptesarik@...e.cz>, x86@...nel.org,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 2/2] printk/panic/x86: Allow to access printk log buffer
after crash_smp_send_stop()
On (07/18/19 14:07), Konstantin Khlebnikov wrote:
> > Let me test the waters. Criticize the following idea:
> >
> > Can we, sort of, disconnect "supposed to be dead" CPUs from printk()
> > so then we can unconditionally re-init printk() from panic-CPU?
> >
> > We have per-CPU printk_state; so panic-CPU can set, let's say,
> > DEAD_CPUS_TELL_NO_TALES bit on all CPUs but self, and vprintk_func()
> > will do nothing if DEAD_CPUS_TELL_NO_TALES bit set on particular
> > CPU. Foreign CPUs are not even supposed to be alive, and smp_send_stop()
> > waits for IPI acks from secondary CPUs long enough on average (need
> > to check that) so if one of the CPUs is misbehaving and doesn't want
> > to die (geez...) we will just "disconnect" it from printk() to minimize
> > possible logbuf/console drivers interventions and then proceed with
> > panic; assuming that misbehaving CPUs are actually up to something
> > sane. Sometimes, you know, in some cases, those CPUs are already dead:
> > either accidentally powered off, or went completely nuts and do nothing,
> > etc. etc. but we still can kdump() and console_flush_on_panic().
>
> Good idea.
> Panic-CPU could just increment state to reroute printk into 'safe'
> per-cpu buffer.
Yeah, that's better.
So we can do something like this
@@ -269,15 +269,21 @@ void printk_safe_flush_on_panic(void)
* Make sure that we could access the main ring buffer.
* Do not risk a double release when more CPUs are up.
*/
- if (raw_spin_is_locked(&logbuf_lock)) {
- if (num_online_cpus() > 1)
- return;
+ debug_locks_off();
+ raw_spin_lock_init(&logbuf_lock);
+ /* + re-init the rest of printk() locks */
+ printk_safe_flush();
+}
[..]
+void printk_switch_to_panic_mode(int panic_cpu)
+{
+ int cpu;
+ for_each_possible_cpu(cpu) {
+ if (cpu == panic_cpu)
+ continue;
+ per_cpu(printk_context, cpu) = 42;
+ }
}
And call printk_switch_to_panic_mode() from panic(). And we don't
need to touch arch code (it also covers the case when some new ARCH
will gain NMI support).
-ss
Powered by blists - more mailing lists