linux-kernel - Re: [PATCH v2 1/1] powerpc/crash: Use NMI context for printk when starting to crash

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <1589344247.2akwhmzwhg.astroid@bobo.none>
Date:   Wed, 13 May 2020 14:36:21 +1000
From:   Nicholas Piggin <npiggin@...il.com>
To:     Alexios Zavras <alexios.zavras@...el.com>,
        Benjamin Herrenschmidt <benh@...nel.crashing.org>,
        Christophe Leroy <christophe.leroy@....fr>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Enrico Weigelt <info@...ux.net>,
        Leonardo Bras <leobras.c@...il.com>,
        Michael Ellerman <mpe@...erman.id.au>,
        Paul Mackerras <paulus@...ba.org>,
        Thomas Gleixner <tglx@...utronix.de>
Cc:     linux-kernel@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org
Subject: Re: [PATCH v2 1/1] powerpc/crash: Use NMI context for printk when
 starting to crash

Excerpts from Leonardo Bras's message of May 13, 2020 7:45 am:
> Currently, if printk lock (logbuf_lock) is held by other thread during
> crash, there is a chance of deadlocking the crash on next printk, and
> blocking a possibly desired kdump.
> 
> At the start of default_machine_crash_shutdown, make printk enter
> NMI context, as it will use per-cpu buffers to store the message,
> and avoid locking logbuf_lock.

printk_nmi_enter is used in one other place outside nmi_enter.

Is there a different/better way to handle this? What do other 
architectures do?

Other subsystems get put into an nmi-mode when we call nmi_enter
(lockdep, ftrace, rcu etc). It seems like those would be useful for 
similar reasons, so at least explaining why that is not used in a 
comment would be good.

Aside from that, I welcome any effort to make our crashes more reliable
so thanks for working on this stuff.

Thanks,
Nick

> 
> Suggested-by: Michael Ellerman <mpe@...erman.id.au>
> Signed-off-by: Leonardo Bras <leobras.c@...il.com>
> 
> ---
> Changes since v1:
> - Added in-code comment explaining the need of context change
> - Function moved to the start of default_machine_crash_shutdown,
>   to avoid locking any printk on crashing routine.
> - Title was 'Use NMI context for printk after crashing other CPUs'
> 
> ---
>  arch/powerpc/kexec/crash.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/powerpc/kexec/crash.c b/arch/powerpc/kexec/crash.c
> index d488311efab1..c9a889880214 100644
> --- a/arch/powerpc/kexec/crash.c
> +++ b/arch/powerpc/kexec/crash.c
> @@ -311,6 +311,9 @@ void default_machine_crash_shutdown(struct pt_regs *regs)
>  	unsigned int i;
>  	int (*old_handler)(struct pt_regs *regs);
>  
> +	/* Avoid hardlocking with irresponsive CPU holding logbuf_lock */
> +	printk_nmi_enter();
> +
>  	/*
>  	 * This function is only called after the system
>  	 * has panicked or is otherwise in a critical state.
> -- 
> 2.25.4
> 
>