lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 25 Jan 2022 16:04:59 +0100
From:   Petr Mladek <pmladek@...e.com>
To:     John Ogness <john.ogness@...utronix.de>
Cc:     Stephen Brennan <stephen.s.brennan@...cle.com>,
        Sergey Senozhatsky <senozhatsky@...omium.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 4/4] printk: Drop console_sem during panic

On Mon 2022-01-24 17:18:42, John Ogness wrote:
> On 2022-01-21, Stephen Brennan <stephen.s.brennan@...cle.com> wrote:
> > If another CPU is in panic, we are about to be halted. Try to gracefully
> > drop console_sem and allow the panic CPU to grab it easily.
> >
> > Suggested-by: Petr Mladek <pmladek@...e.com>
> > Signed-off-by: Stephen Brennan <stephen.s.brennan@...cle.com>
> > ---
> >  kernel/printk/printk.c | 8 +++++++-
> >  1 file changed, 7 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > index ca253ac07615..c2dc8ebd9509 100644
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -2668,7 +2668,7 @@ void console_unlock(void)
> >  
> >  	for (;;) {
> >  		size_t ext_len = 0;
> > -		int handover;
> > +		int handover, pcpu;
> >  		size_t len;
> >  
> >  skip:
> > @@ -2739,6 +2739,12 @@ void console_unlock(void)
> >  		if (handover)
> >  			return;
> >  
> > +		/* Allow panic_cpu to take over the consoles safely */
> > +		pcpu = atomic_read(&panic_cpu);
> > +		if (unlikely(pcpu != PANIC_CPU_INVALID &&
> > +		    pcpu != raw_smp_processor_id()))
> > +			break;
> > +
> 
> Keep in mind that after the "break", this context will try to re-acquire
> the console lock and continue printing. That is a pretty small window
> for the panic CPU to attempt a trylock.
>
> Perhaps the retry after the loop should also be avoided for non-panic
> CPUs. This would rely on the panic CPU taking over (as your comment
> suggests will happen). Since the panic-CPU calls pr_emerg() as the final
> record before drifting off to neverland, that is probably OK.

Great catch!

> Something like:
> 
> @@ -2731,7 +2731,8 @@ void console_unlock(void)
>  	 * there's a new owner and the console_unlock() from them will do the
>  	 * flush, no worries.
>  	 */
> -	retry = prb_read_valid(prb, next_seq, NULL);
> +	retry = (pcpu != raw_smp_processor_id()) &&
> +		prb_read_valid(prb, next_seq, NULL);
>  	if (retry && console_trylock())
>  		goto again;
>  }
> 
> I would also like to see a comment about why it is acceptable to use
> raw_smp_processor_id() in a context that has migration
> enabled. Something like: raw_smp_processor_id() can be used because this
> context cannot be migrated to the panic CPU.

Yup. It would be nice to mention this.

We actually need the same check in both locations. I would either put
it into a helper or I would pass the result via a variable:

The helper might look like:

/*
 * Return true when this CPU should unlock console_sem without pushing
 * all messages to the console. It would allow to call console in
 * panic CPU a safe way even after other CPUs are stopped.
 *
 * It can be called safely even in preemptive context because panic
 * CPU does not longer schedule.
 */
static bool abandon_console_lock_in_panic()
{
	if (!panic_in_progress())
		return false;

	return atomic_read(&panic_cpu) != raw_smp_processor_id();
}

Note that I used panic_in_progress() because it makes the code more
readable. Using pcpu variable is small optimization. But it has effect
only during panic() where it is not important. It is slow path anyway.

Best Regards,
Petr

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ