linux-kernel - Re: Removal of printk safe buffers delays NMI context printk

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1636111599.wwppq55w4t.astroid@bobo.none>
Date:   Fri, 05 Nov 2021 21:43:21 +1000
From:   Nicholas Piggin <npiggin@...il.com>
To:     John Ogness <john.ogness@...utronix.de>
Cc:     Laurent
 Dufour <ldufour@...ux.ibm.com>,
        linux-kernel@...r.kernel.org, Petr Mladek <pmladek@...e.com>
Subject: Re: Removal of printk safe buffers delays NMI context printk

Excerpts from John Ogness's message of November 5, 2021 7:55 pm:
> On 2021-11-05, Nicholas Piggin <npiggin@...il.com> wrote:
>>> We are planning on implementing a pr_flush() that will do something
>>> similar. But I am wondering how you are planning on triggering a CPU
>>> to call that function.
>>
>> Similar way as was removed by commit 93d102f094b ("printk: remove safe
>> buffers") fro nmi_backtrace and powerpc watchdog. The following patch
>> is what I'm using which seems to work for me.
> 
> What was removed from 93d102f094b was irq_work triggering on all
> CPUs.

No, it was the caller executing the flush for all remote CPUs itself.
irq work was not involved (and irq work can't be raised in a remote
CPU from NMI context).

> Your patch is attempting to directly print from a non-NMI
> context. This is obviously a better approach if a caller is known to be
> non-NMI.
> 
> Also, before 93d102f094b the safe buffer flushing was performed before
> triggering the NMI backtraces (in some cases). So clearly it worked
> before by accident rather than by design. (Not that that really
> matters.)

That's not the case, in the core nmi_backtrace.c, the flush happens
afterward.

In the powerpc code it's not actually the case either but it wasn't
obvious why at least should have been commented. powerpc's
trigger_allbutself_cpu_backtrace() is not a NMI IPI, just a normal
one. So if a CPU takes it then it will flush its own buffer via
irq_work. You only have to do the printk flush for the case of the
true NMI IPI which comes before.

That code does need a bit more work for other reasons, but we do
need that printk flush capability back there and for nmi_backtrace.

>> What would pr_flush do differently and where were you planning to use
>> it?
> 
> Currently, and particularly when we move to threaded printers, printk
> callers do not know when their messages are actually visible. However,
> in some cases the printk caller (or some related task) wants to be sure
> the message is visible.
> 
> pr_flush() will allow a task to block until all pending messages (at the
> time of the call) are output on all consoles.
> 
> Your patch is doing something similar for non-NMI contexts, except that
> it will exit early if it fails to get the console lock. For your case
> this is probably acceptable because you are only concerned that some
> context is printing the messages, even if the latest messages may not
> have been printed yet.

Yep, if something else has the lock hopefully it should release it and
do the flush when it does. We're already 10 seconds delayed here so a
bit longer doesn't matter, just needs to get out at some point.

>>     printk: restore flushing of NMI buffers on remote CPUs after NMI backtraces
>>     
>>     printk from NMI context relies on irq work being raised on the local CPU
>>     to print to console. This can be a problem if the NMI was raised by a
>>     lockup detector to print lockup stack and regs, because the CPU may not
>>     enable irqs (because it is locked up).
>>     
>>     Introduce printk_flush() that can be called from non-NMI context on
>>     another CPU to try to get those messages to the console.
>>     
>>     Fixes: 93d102f094be ("printk: remove safe buffers")
>>     Signed-off-by: Nicholas Piggin <npiggin@...il.com>
>>
>> diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
>> index 5f69ba4de1f3..59ddb24797ae 100644
>> --- a/arch/powerpc/kernel/watchdog.c
>> +++ b/arch/powerpc/kernel/watchdog.c
>> @@ -227,6 +227,12 @@ static void watchdog_smp_panic(int cpu)
>>  		cpumask_clear(&wd_smp_cpus_ipi);
>>  	}
>>  
>> +	/*
>> +	 * Force flush any remote buffers that might be stuck in IRQ context
>> +	 * and therefore could not run their irq_work.
>> +	 */
>> +	printk_flush();
>> +
> 
> nmi_trigger_cpumask_backtrace() tracks and waits (up to 10 seconds) for
> the CPUs to print their backtrace before flushing. Here there is no
> waiting. I suppose in practice the NMI backtrace will be fast enough,
> but I am just wondering if there should be some sort of tracking/waiting
> like in nmi_trigger_cpumask_backtrace().

Yeah it has tended to work, but I have something to fix that up.

> 
>>  	if (hardlockup_panic)
>>  		nmi_panic(NULL, "Hard LOCKUP");
>>  
>> diff --git a/include/linux/printk.h b/include/linux/printk.h
>> index 85b656f82d75..50424ad87fb5 100644
>> --- a/include/linux/printk.h
>> +++ b/include/linux/printk.h
>> @@ -198,6 +198,7 @@ void dump_stack_print_info(const char *log_lvl);
>>  void show_regs_print_info(const char *log_lvl);
>>  extern asmlinkage void dump_stack_lvl(const char *log_lvl) __cold;
>>  extern asmlinkage void dump_stack(void) __cold;
>> +void printk_flush(void);
>>  #else
>>  static inline __printf(1, 0)
>>  int vprintk(const char *s, va_list args)
>> @@ -274,6 +275,9 @@ static inline void dump_stack_lvl(const char *log_lvl)
>>  static inline void dump_stack(void)
>>  {
>>  }
>> +static inline void printk_flush(void)
>> +{
>> +}
>>  #endif
>>  
>>  #ifdef CONFIG_SMP
>> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
>> index a8d0a58deebc..dd9471bb58c2 100644
>> --- a/kernel/printk/printk.c
>> +++ b/kernel/printk/printk.c
>> @@ -2206,6 +2206,26 @@ int vprintk_store(int facility, int level,
>>  	return ret;
>>  }
>>  
>> +void printk_flush(void)
>> +{
>> +	/*
>> +	 * Disable preemption to avoid being preempted while holding
>> +	 * console_sem which would prevent anyone from printing to
>> +	 * console
>> +	 */
>> +	preempt_disable();
>> +	/*
>> +	 * Try to acquire and then immediately release the console
>> +	 * semaphore.  The release will print out buffers and wake up
>> +	 * /dev/kmsg and syslog() users.
>> +	 */
>> +	if (console_trylock_spinning())
>> +		console_unlock();
>> +	preempt_enable();
>> +
>> +	wake_up_klogd();
>> +}
> 
> I have mixed feelings about this printk_flush() because it is exactly
> this code that is about to undergo massive reworking.

Better to fix up the regression before said massive reworking I think.

> If this function
> is accepted, then I wonder if printk_trigger_flush() might be a better
> name since it does not necessarily wait until the messages are
> printed. (There could be another task printing and already a queued
> handover printer. Or this task could handover printing before
> finishing.)

Sure I'm not wedded to the name. I can resend with the name change if
there's no other issues (the powerpc fixups will go as separate series).

Thanks,
Nick