linux-kernel - Re: [PATCH 4/4] printk: Drop console

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YfARmwyLyl8gj6Zy@alley>
Date:   Tue, 25 Jan 2022 16:04:59 +0100
From:   Petr Mladek <pmladek@...e.com>
To:     John Ogness <john.ogness@...utronix.de>
Cc:     Stephen Brennan <stephen.s.brennan@...cle.com>,
        Sergey Senozhatsky <senozhatsky@...omium.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 4/4] printk: Drop console_sem during panic

On Mon 2022-01-24 17:18:42, John Ogness wrote:
> On 2022-01-21, Stephen Brennan <stephen.s.brennan@...cle.com> wrote:
> > If another CPU is in panic, we are about to be halted. Try to gracefully
> > drop console_sem and allow the panic CPU to grab it easily.
> >
> > Suggested-by: Petr Mladek <pmladek@...e.com>
> > Signed-off-by: Stephen Brennan <stephen.s.brennan@...cle.com>
> > ---
> >  kernel/printk/printk.c | 8 +++++++-
> >  1 file changed, 7 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > index ca253ac07615..c2dc8ebd9509 100644
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -2668,7 +2668,7 @@ void console_unlock(void)
> >  
> >  	for (;;) {
> >  		size_t ext_len = 0;
> > -		int handover;
> > +		int handover, pcpu;
> >  		size_t len;
> >  
> >  skip:
> > @@ -2739,6 +2739,12 @@ void console_unlock(void)
> >  		if (handover)
> >  			return;
> >  
> > +		/* Allow panic_cpu to take over the consoles safely */
> > +		pcpu = atomic_read(&panic_cpu);
> > +		if (unlikely(pcpu != PANIC_CPU_INVALID &&
> > +		    pcpu != raw_smp_processor_id()))
> > +			break;
> > +
> 
> Keep in mind that after the "break", this context will try to re-acquire
> the console lock and continue printing. That is a pretty small window
> for the panic CPU to attempt a trylock.
>
> Perhaps the retry after the loop should also be avoided for non-panic
> CPUs. This would rely on the panic CPU taking over (as your comment
> suggests will happen). Since the panic-CPU calls pr_emerg() as the final
> record before drifting off to neverland, that is probably OK.

Great catch!

> Something like:
> 
> @@ -2731,7 +2731,8 @@ void console_unlock(void)
>  	 * there's a new owner and the console_unlock() from them will do the
>  	 * flush, no worries.
>  	 */
> -	retry = prb_read_valid(prb, next_seq, NULL);
> +	retry = (pcpu != raw_smp_processor_id()) &&
> +		prb_read_valid(prb, next_seq, NULL);
>  	if (retry && console_trylock())
>  		goto again;
>  }
> 
> I would also like to see a comment about why it is acceptable to use
> raw_smp_processor_id() in a context that has migration
> enabled. Something like: raw_smp_processor_id() can be used because this
> context cannot be migrated to the panic CPU.

Yup. It would be nice to mention this.

We actually need the same check in both locations. I would either put
it into a helper or I would pass the result via a variable:

The helper might look like:

/*
 * Return true when this CPU should unlock console_sem without pushing
 * all messages to the console. It would allow to call console in
 * panic CPU a safe way even after other CPUs are stopped.
 *
 * It can be called safely even in preemptive context because panic
 * CPU does not longer schedule.
 */
static bool abandon_console_lock_in_panic()
{
	if (!panic_in_progress())
		return false;

	return atomic_read(&panic_cpu) != raw_smp_processor_id();
}

Note that I used panic_in_progress() because it makes the code more
readable. Using pcpu variable is small optimization. But it has effect
only during panic() where it is not important. It is slow path anyway.

Best Regards,
Petr