linux-kernel - Re: [PATCH 2/2] printk/panic/x86: Allow to access printk log buffer after crash_smp_send

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190718112954.GA1774@jagdpanzerIV>
Date:   Thu, 18 Jul 2019 20:29:54 +0900
From:   Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
To:     Konstantin Khlebnikov <koct9i@...il.com>
Cc:     Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
        Petr Mladek <pmladek@...e.com>,
        Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        John Ogness <john.ogness@...utronix.de>,
        Petr Tesarik <ptesarik@...e.cz>, x86@...nel.org,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 2/2] printk/panic/x86: Allow to access printk log buffer
 after crash_smp_send_stop()

On (07/18/19 14:07), Konstantin Khlebnikov wrote:
> > Let me test the waters. Criticize the following idea:
> >
> > Can we, sort of, disconnect "supposed to be dead" CPUs from printk()
> > so then we can unconditionally re-init printk() from panic-CPU?
> >
> > We have per-CPU printk_state; so panic-CPU can set, let's say,
> > DEAD_CPUS_TELL_NO_TALES bit on all CPUs but self, and vprintk_func()
> > will do nothing if DEAD_CPUS_TELL_NO_TALES bit set on particular
> > CPU. Foreign CPUs are not even supposed to be alive, and smp_send_stop()
> > waits for IPI acks from secondary CPUs long enough on average (need
> > to check that) so if one of the CPUs is misbehaving and doesn't want
> > to die (geez...) we will just "disconnect" it from printk() to minimize
> > possible logbuf/console drivers interventions and then proceed with
> > panic; assuming that misbehaving CPUs are actually up to something
> > sane. Sometimes, you know, in some cases, those CPUs are already dead:
> > either accidentally powered off, or went completely nuts and do nothing,
> > etc. etc. but we still can kdump() and console_flush_on_panic().
> 
> Good idea.
> Panic-CPU could just increment state to reroute printk into 'safe'
> per-cpu buffer.

Yeah, that's better.

So we can do something like this

@@ -269,15 +269,21 @@ void printk_safe_flush_on_panic(void)
 	 * Make sure that we could access the main ring buffer.
 	 * Do not risk a double release when more CPUs are up.
 	 */
-	if (raw_spin_is_locked(&logbuf_lock)) {
-		if (num_online_cpus() > 1)
-			return;
+	debug_locks_off();
+	raw_spin_lock_init(&logbuf_lock);
+	/* + re-init the rest of printk() locks */
+	printk_safe_flush();
+}

[..]

+void printk_switch_to_panic_mode(int panic_cpu)
+{
+	int cpu;
 
+	for_each_possible_cpu(cpu) {
+		if (cpu == panic_cpu)
+			continue;
+		per_cpu(printk_context, cpu) = 42;
+	}
 }

And call printk_switch_to_panic_mode() from panic(). And we don't
need to touch arch code (it also covers the case when some new ARCH
will gain NMI support).

	-ss