lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220612163044.GS1790663@paulmck-ThinkPad-P17-Gen-1>
Date:   Sun, 12 Jun 2022 09:30:44 -0700
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     John Ogness <john.ogness@...utronix.de>
Cc:     linux-kernel@...r.kernel.org, frederic@...nel.org, pmladek@...e.com
Subject: Re: [BUG] 8e274732115f ("printk: extend console_lock for per-console
 locking")

On Sun, Jun 12, 2022 at 06:09:10PM +0206, John Ogness wrote:
> Hi Paul,
> 
> Thanks for looking into this! I am currently on vacation with family, so
> my responses are limited. Some initial comments from me below...

First, this is not an emergency.  I have a good workaround that just got
done passing significant rcutorture testing.  This means that I can port
my RCU changes to v5.19-rc1/2 and get on with other testing.

So please ignore this for the rest of your time away, and have a great
time with your family!!!

> On 2022-06-12, "Paul E. McKenney" <paulmck@...nel.org> wrote:
> > And the patch below takes care of things in (admittedly quite light)
> > testing thus far.  What it does is add ten seconds of pure delay
> > before rcutorture shuts down the system.  Presumably, this delay gives
> > printk() the time that it needs to flush its buffers.  In the
> > configurations that I have tested thus far, anyway.
> >
> > So what should I be doing instead?
> >
> > o	console_flush_on_panic() seems like strong medicine, but might
> > 	be the right thing to do.  The bit about proceeding even though
> > 	it failed to acquire the lock doesn't look good for non-panic
> >       use.
> 
> For sure not this one.
> 
> > o	printk_trigger_flush() has an attractive name, but it looks
> > 	like it only just starts the flush rather than waiting for it
> > 	to finish.
> 
> Correct. It just triggers.
> 
> > o	pr_flush(1000, true) looks quite interesting, and also seems to
> > 	work in a few quick tests, so I will continue playing with that.
> 
> This is only useful if the context is guaranteed may_sleep().

Which is the case when called from torture_shutdown().

But it does seem to be common to invoke kernel_power_off() from things
like interrupt handlers.  Which means that putting the pr_flush() in
kernel_power_off() would be a bad idea given that we cannot detect
non-preemptible regions of code with CONFIG_PREEMPT_NONE=y kernels.
(That again!)

So any fix within kernel_power_off() would be a bit "interesting".

> What is _supposed_ to happen is that @system_state increases above
> SYSTEM_RUNNING, which then causes direct printing to be used. So the
> pr_emerg("Power down\n") in kernel_power_off() would directly flush all
> remaining messages.
> 
> But if the threaded printers are already in the process of printing,
> they block direct printing. That may be what we are seeing here.

Given that rcutorture can be a bit chatty at shutdown time, my guess
is that the threaded printers are already in the process of printing.

> What I find particularly interesting is that it is not the kthread-patch
> that is causing the issue.

I do know that feeling!

> I will have some time tonight to take a closer look.

Please wait until you are back from your vacation.  Given that I now
have a workaround, which might be as good a fix as there is, there is
no need to interrupt your vacation.

							Thanx, Paul

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ