lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1636073838.qpmyp6q17i.astroid@bobo.none>
Date:   Fri, 05 Nov 2021 11:26:35 +1000
From:   Nicholas Piggin <npiggin@...il.com>
To:     John Ogness <john.ogness@...utronix.de>
Cc:     linux-kernel@...r.kernel.org, Petr Mladek <pmladek@...e.com>,
        Laurent Dufour <ldufour@...ux.ibm.com>
Subject: Re: Removal of printk safe buffers delays NMI context printk

Excerpts from John Ogness's message of November 5, 2021 2:18 am:
> Hi Nick,
> 
> On 2021-11-05, Nicholas Piggin <npiggin@...il.com> wrote:
>> It seems printk from NMI context is now delayed indefinitely and
>> there is no printk_safe_flush equivalent (or I can't see one) to
>> allow a NMI buffer to be flushed by a different CPU.
> 
> NMI flushing is triggered using irq work (for the same CPU). This should
> not have changed recently. Are you reporting a new issue?

Previously prink_safe_flush from another CPU would flush those out.

>> This causes hard lockup watchdog messages to not get shown on the
>> console. I can call printk from a different CPU and that seems to
>> flush the stuck CPU's NMI buffer immediately.
> 
> Perhaps we should be triggering the irq work on multiple CPUs if from
> NMI context?

Possibly. I don't know if it's really needed because the irq work should 
usually be fine, the problem are lockup detector messages where the
printing CPU does not enable irqs after it returns from NMI.

> 
>> What's the best way to expose this? Can we have something like tihs?
>>
>> void printk_flush(void)
>> {
>> 	preempt_disable();
>> 	if (console_trylock_spinning())
>> 		console_unlock();
>> 	preempt_enable();
>>         wake_up_klogd();
>> }
> 
> We are planning on implementing a pr_flush() that will do something
> similar. But I am wondering how you are planning on triggering a CPU to
> call that function.

Similar way as was removed by commit 93d102f094b ("printk: remove safe 
buffers") fro nmi_backtrace and powerpc watchdog. The following patch
is what I'm using which seems to work for me.

What would pr_flush do differently and where were you planning to use 
it?

Thanks,
Nick
--
    printk: restore flushing of NMI buffers on remote CPUs after NMI backtraces
    
    printk from NMI context relies on irq work being raised on the local CPU
    to print to console. This can be a problem if the NMI was raised by a
    lockup detector to print lockup stack and regs, because the CPU may not
    enable irqs (because it is locked up).
    
    Introduce printk_flush() that can be called from non-NMI context on
    another CPU to try to get those messages to the console.
    
    Fixes: 93d102f094be ("printk: remove safe buffers")
    Signed-off-by: Nicholas Piggin <npiggin@...il.com>

diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index 5f69ba4de1f3..59ddb24797ae 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -227,6 +227,12 @@ static void watchdog_smp_panic(int cpu)
 		cpumask_clear(&wd_smp_cpus_ipi);
 	}
 
+	/*
+	 * Force flush any remote buffers that might be stuck in IRQ context
+	 * and therefore could not run their irq_work.
+	 */
+	printk_flush();
+
 	if (hardlockup_panic)
 		nmi_panic(NULL, "Hard LOCKUP");
 
diff --git a/include/linux/printk.h b/include/linux/printk.h
index 85b656f82d75..50424ad87fb5 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -198,6 +198,7 @@ void dump_stack_print_info(const char *log_lvl);
 void show_regs_print_info(const char *log_lvl);
 extern asmlinkage void dump_stack_lvl(const char *log_lvl) __cold;
 extern asmlinkage void dump_stack(void) __cold;
+void printk_flush(void);
 #else
 static inline __printf(1, 0)
 int vprintk(const char *s, va_list args)
@@ -274,6 +275,9 @@ static inline void dump_stack_lvl(const char *log_lvl)
 static inline void dump_stack(void)
 {
 }
+static inline void printk_flush(void)
+{
+}
 #endif
 
 #ifdef CONFIG_SMP
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index a8d0a58deebc..dd9471bb58c2 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2206,6 +2206,26 @@ int vprintk_store(int facility, int level,
 	return ret;
 }
 
+void printk_flush(void)
+{
+	/*
+	 * Disable preemption to avoid being preempted while holding
+	 * console_sem which would prevent anyone from printing to
+	 * console
+	 */
+	preempt_disable();
+	/*
+	 * Try to acquire and then immediately release the console
+	 * semaphore.  The release will print out buffers and wake up
+	 * /dev/kmsg and syslog() users.
+	 */
+	if (console_trylock_spinning())
+		console_unlock();
+	preempt_enable();
+
+	wake_up_klogd();
+}
+
 asmlinkage int vprintk_emit(int facility, int level,
 			    const struct dev_printk_info *dev_info,
 			    const char *fmt, va_list args)
@@ -2228,24 +2248,11 @@ asmlinkage int vprintk_emit(int facility, int level,
 	printed_len = vprintk_store(facility, level, dev_info, fmt, args);
 
 	/* If called from the scheduler, we can not call up(). */
-	if (!in_sched) {
-		/*
-		 * Disable preemption to avoid being preempted while holding
-		 * console_sem which would prevent anyone from printing to
-		 * console
-		 */
-		preempt_disable();
-		/*
-		 * Try to acquire and then immediately release the console
-		 * semaphore.  The release will print out buffers and wake up
-		 * /dev/kmsg and syslog() users.
-		 */
-		if (console_trylock_spinning())
-			console_unlock();
-		preempt_enable();
-	}
+	if (!in_sched)
+		printk_flush();
+	else
+		wake_up_klogd();
 
-	wake_up_klogd();
 	return printed_len;
 }
 EXPORT_SYMBOL(vprintk_emit);
diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
index f9e89001b52e..b735b2e24902 100644
--- a/lib/nmi_backtrace.c
+++ b/lib/nmi_backtrace.c
@@ -75,6 +75,12 @@ void nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
 		touch_softlockup_watchdog();
 	}
 
+	/*
+	 * Force flush any remote buffers that might be stuck in IRQ context
+	 * and therefore could not run their irq_work.
+	 */
+	printk_flush();
+
 	clear_bit_unlock(0, &backtrace_flag);
 	put_cpu();
 }

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ