linux-kernel - Re: flush was: Re: [PATCH printk v2 22/26] printk: nbcon: Implement emergency sections

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZeH-VwAdsn2Tsgr2@alley>
Date: Fri, 1 Mar 2024 17:12:07 +0100
From: Petr Mladek <pmladek@...e.com>
To: John Ogness <john.ogness@...utronix.de>
Cc: Sergey Senozhatsky <senozhatsky@...omium.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Thomas Gleixner <tglx@...utronix.de>, linux-kernel@...r.kernel.org,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Subject: Re: flush was: Re: [PATCH printk v2 22/26] printk: nbcon: Implement
 emergency sections

On Fri 2024-03-01 16:49:54, Petr Mladek wrote:
> On Sun 2024-02-18 20:03:22, John Ogness wrote:
> > From: Thomas Gleixner <tglx@...utronix.de>
> > 
> > In emergency situations (something has gone wrong but the
> > system continues to operate), usually important information
> > (such as a backtrace) is generated via printk(). Each
> > individual printk record has little meaning. It is the
> > collection of printk messages that is most often needed by
> > developers and users.
> > 
> > In order to help ensure that the collection of printk messages
> > in an emergency situation are all stored to the ringbuffer as
> > quickly as possible, disable console output for that CPU while
> > it is in the emergency situation. When exiting the emergency
> > situation, trigger the consoles to be flushed.
> > 
> > Add per-CPU emergency nesting tracking because an emergency
> > can arise while in an emergency situation.
> > 
> > Add functions to mark the beginning and end of emergency
> > sections where the urgent messages are generated.
> > 
> > Do not print if the current CPU is in an emergency state.
> > 
> > Trigger console flushing when exiting all emergency nesting.
> > 
> > Note that the emergency state is not system-wide. While one CPU
> > is in an emergency state, another CPU may continue to print
> > console messages.
> > 
> > --- a/kernel/printk/nbcon.c
> > +++ b/kernel/printk/nbcon.c
> > @@ -1105,6 +1134,58 @@ void nbcon_atomic_flush_unsafe(void)
> >  	__nbcon_atomic_flush_all(prb_next_reserve_seq(prb), true);
> >  }
> >  
> > +/**
> > + * nbcon_cpu_emergency_enter - Enter an emergency section where printk()
> > + *	messages for that CPU are only stored
> > + *
> > + * Upon exiting the emergency section, all stored messages are flushed.
> > + *
> > + * Context:	Any context. Disables preemption.
> > + *
> > + * When within an emergency section, no printing occurs on that CPU. This
> > + * is to allow all emergency messages to be dumped into the ringbuffer before
> > + * flushing the ringbuffer. The actual printing occurs when exiting the
> > + * outermost emergency section.
> > + */
> > +void nbcon_cpu_emergency_enter(void)
> > +{
> > +	unsigned int *cpu_emergency_nesting;
> > +
> > +	preempt_disable();
> > +
> > +	cpu_emergency_nesting = nbcon_get_cpu_emergency_nesting();
> > +	(*cpu_emergency_nesting)++;
> > +}
> > +
> > +/**
> > + * nbcon_cpu_emergency_exit - Exit an emergency section and flush the
> > + *	stored messages
> > + *
> > + * Flushing only occurs when exiting all nesting for the CPU.
> > + *
> > + * Context:	Any context. Enables preemption.
> > + */
> > +void nbcon_cpu_emergency_exit(void)
> > +{
> > +	unsigned int *cpu_emergency_nesting;
> > +	bool do_trigger_flush = false;
> > +
> > +	cpu_emergency_nesting = nbcon_get_cpu_emergency_nesting();
> > +
> > +	WARN_ON_ONCE(*cpu_emergency_nesting == 0);
> > +
> > +	if (*cpu_emergency_nesting == 1)
> > +		do_trigger_flush = true;
> > +
> > +	/* Undo the nesting count of nbcon_cpu_emergency_enter(). */
> > +	(*cpu_emergency_nesting)--;
> > +
> > +	preempt_enable();
> > +
> > +	if (do_trigger_flush)
> > +		printk_trigger_flush();
> 
> Just an idea. printk_trigger_flush() calls defer_console_output().
> It always moves the flushing into IRQ context.
> 
> It might make sense to add a flush() function which would try
> to flush the messages directly from this context and
> queue the IRQ work only when it fails. Something like:
> 
> void printk_emergency_flush()
> {
> 	/* nbcon consoles could be flushed in any context. */
> 	if (have_nbcon_console)
> 		nbcon_flush_all();

It seems that the previous version of the patchset called this
in nbcon_cpu_emergency_exit() before decrementing the cpu state.

Sigh, it was me who said that it was useless, see
https://lore.kernel.org/all/ZQ3R4Lz1LHQYsylw@alley/

I am sorry for this.

To my defense. It was a generic function added early in the patchset.
And it was used by both emergency and panic contexts. It is possible
that I did not know about the emergency context behavior at
the time of review.


> 	if (have_legacy_console) {
> 		if (console_trylock())
> 			console_unlock();
> 		else
> 			defer_console_output();
> 	}
> }
> 
> But wait, nbcon_flush_all() might have troubles to get the per-console
> lock because it would be called with NBCON_PRIO_NORMAL.
> 
> 
> Wait, wait, wait.
> 
> defer_console_output() schedules wake_up_klogd_work_func(). It flushes
> only legacy consoles. It means that even emergency messages would
> need to wait for the printk kthread.
> 
> By other words, it seems that the emergency context does not have any
> effect for nbcon consoles.

It should get fixed by calling nbcon_flush_all() before decrementing
cpu_emergency_nesting counter.

Best Regards,
Petr