lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20220613155145.GF1790663@paulmck-ThinkPad-P17-Gen-1>
Date:   Mon, 13 Jun 2022 08:51:45 -0700
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Petr Mladek <pmladek@...e.com>
Cc:     John Ogness <john.ogness@...utronix.de>,
        linux-kernel@...r.kernel.org, frederic@...nel.org,
        Sergey Senozhatsky <senozhatsky@...omium.org>,
        Steven Rostedt <rostedt@...dmis.org>
Subject: Re: [BUG] 8e274732115f ("printk: extend console_lock for per-console
 locking")

On Mon, Jun 13, 2022 at 05:07:47PM +0200, Petr Mladek wrote:
> On Mon 2022-06-13 12:32:08, Petr Mladek wrote:
> > On Mon 2022-06-13 11:10:19, John Ogness wrote:
> > > On 2022-06-12, "Paul E. McKenney" <paulmck@...nel.org> wrote:
> > > >> As I suspected, the final printk's cannot direct print because the
> > > >> kthread was printing. Using the below patch did seem to address your
> > > >> problem. But this is probably not the way forward.
> > > >
> > > > When I apply it, I still lose output, perhaps due to different timing?
> > > > Doing the pr_flush(1000, true) just before the call to kernel_power_off()
> > > > has been working quite well thus far, though.
> > > 
> > > @Petr, I like the idea of the kthreads getting out of the way rather
> > > than trying to direct print themselves (for this situation). It still
> > > isn't optimal because that final pr_emerg("Power down\n") might come
> > > before the kthread has finished its current line. But in that case the
> > > kthread may not have much a chance to finish the printing anyway.
> > 
> > I wonder if we could somehow combine it with pr_flush(timeout).
> > 
> > The kthread might bail-out when pr_flush() is running. It will
> > know that someone would continue printing. The timeout might
> > help to avoid a deadlock. We could somehow reuse
> > console_trylock_spinning() code here.
> > > 
> > > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > > index ea3dd55709e7..45c6c2b0b104 100644
> > > --- a/kernel/printk/printk.c
> > > +++ b/kernel/printk/printk.c
> > > @@ -3729,7 +3729,9 @@ static bool printer_should_wake(struct console *con, u64 seq)
> > >  		return true;
> > >  
> > >  	if (con->blocked ||
> > > -	    console_kthreads_atomically_blocked()) {
> > > +	    console_kthreads_atomically_blocked() ||
> > > +	    system_state > SYSTEM_RUNNING ||
> > > +	    oops_in_progress) {
> > >  		return false;
> > >  	}
> > 
> > Also this is an interesting idea. We know that panic() will try
> > to flush the messages. Well, panic() is not always called
> > after Oops.
> 
> I tried to combine the various approaches and findings into
> the following patch.
> 
> It introduces try_block_console_kthread() function. It tries
> to summon console_lock() so that it is later available for
> printk().
> 
> The function causes that console kthreads will not longer
> handle the messages. Also it waits until all kthreads leave
> the critical section and allow to take the global console lock.
> 
> It can be safely called in atomic context because it uses
> busy wait by udelay(). Infinite waiting is prevented by
> timeout currectly hardcoded to 10s.
> 
> The function is used in panic() and kernel_shutdown_prepare()
> when we know that the system is going down and kthreads will
> not be usable any longer.
> 
> 
> Problems:
> 
>   + Just the best effort.
>   + Does not help on single CPU system.
> 
> 
> Testing:
> 
> I tried Paul's RCU torture test few times. It always failed without
> this patch and always succeded with this patch. If I got the resutls
> corretly.
> 
> 
> Possible improvements and alternative approaches:
> 
>    + Prevent console kthreads from sleeping in critical
>      section. Replace mutex with a spinlock?
> 
>    + Allow to steal con->lock similar way like the global
>      console_sem.
> 
>    + Modify console kthreads process priority to make them
>      scheduled immediately on another CPU.
> 
> 
> What do you think, please?

And rcutorture thinks well of this one, also.  ;-)

Thank you both!!!

							Thanx, Paul

> diff --git a/include/linux/printk.h b/include/linux/printk.h
> index 10ec29bc0135..78f27ca15922 100644
> --- a/include/linux/printk.h
> +++ b/include/linux/printk.h
> @@ -173,6 +173,7 @@ extern void printk_prefer_direct_enter(void);
>  extern void printk_prefer_direct_exit(void);
>  
>  extern bool pr_flush(int timeout_ms, bool reset_on_progress);
> +extern void try_block_console_kthreads(int timeout_ms);
>  
>  /*
>   * Please don't use printk_ratelimit(), because it shares ratelimiting state
> diff --git a/kernel/panic.c b/kernel/panic.c
> index a3c758dba15a..4cf13c37bd08 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -297,6 +297,7 @@ void panic(const char *fmt, ...)
>  		 * unfortunately means it may not be hardened to work in a
>  		 * panic situation.
>  		 */
> +		try_block_console_kthreads(10000);
>  		smp_send_stop();
>  	} else {
>  		/*
> @@ -304,6 +305,7 @@ void panic(const char *fmt, ...)
>  		 * kmsg_dump, we will need architecture dependent extra
>  		 * works in addition to stopping other CPUs.
>  		 */
> +		try_block_console_kthreads(10000);
>  		crash_smp_send_stop();
>  	}
>  
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index ea3dd55709e7..0bcd4f5cf2fc 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -250,6 +250,8 @@ static atomic_t console_kthreads_active = ATOMIC_INIT(0);
>  #define console_kthread_printing_exit() \
>  	atomic_dec(&console_kthreads_active)
>  
> +static bool block_console_kthreads;
> +
>  /*
>   * Helper macros to handle lockdep when locking/unlocking console_sem. We use
>   * macros instead of functions so that _RET_IP_ contains useful information.
> @@ -3686,6 +3688,24 @@ bool pr_flush(int timeout_ms, bool reset_on_progress)
>  }
>  EXPORT_SYMBOL(pr_flush);
>  
> +void try_block_console_kthreads(int timeout_ms)
> +{
> +	block_console_kthreads = true;
> +
> +	while (timeout_ms > 0) {
> +		if (console_trylock()) {
> +			console_unlock();
> +			return;
> +		}
> +
> +		udelay(1000);
> +		timeout_ms -= 1;
> +	}
> +
> +	/* Failed to block console kthreads. Let them do the job. */
> +	block_console_kthreads = false;
> +}
> +
>  static void __printk_fallback_preferred_direct(void)
>  {
>  	printk_prefer_direct_enter();
> @@ -3729,7 +3749,8 @@ static bool printer_should_wake(struct console *con, u64 seq)
>  		return true;
>  
>  	if (con->blocked ||
> -	    console_kthreads_atomically_blocked()) {
> +	    console_kthreads_atomically_blocked() ||
> +	    block_console_kthreads) {
>  		return false;
>  	}
>  
> diff --git a/kernel/reboot.c b/kernel/reboot.c
> index a091145ee710..222bdd076cd0 100644
> --- a/kernel/reboot.c
> +++ b/kernel/reboot.c
> @@ -270,6 +270,7 @@ static void kernel_shutdown_prepare(enum system_states state)
>  	blocking_notifier_call_chain(&reboot_notifier_list,
>  		(state == SYSTEM_HALT) ? SYS_HALT : SYS_POWER_OFF, NULL);
>  	system_state = state;
> +	try_block_console_kthreads(10000);
>  	usermodehelper_disable();
>  	device_shutdown();
>  }

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ