lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8734mbdwrf.ffs@tglx>
Date: Sat, 07 Sep 2024 22:39:00 +0200
From: Thomas Gleixner <tglx@...utronix.de>
To: John Ogness <john.ogness@...utronix.de>, Petr Mladek <pmladek@...e.com>
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>, Jiri Slaby
 <jirislaby@...nel.org>, Sergey Senozhatsky <senozhatsky@...omium.org>,
 Steven Rostedt <rostedt@...dmis.org>, linux-serial@...r.kernel.org,
 linux-kernel@...r.kernel.org, Andy Shevchenko
 <andriy.shevchenko@...ux.intel.com>, Tony Lindgren <tony@...mide.com>,
 "Paul E. McKenney" <paulmck@...nel.org>, Uwe Kleine-König
 <u.kleine-koenig@...gutronix.de>, Ilpo Järvinen
 <ilpo.jarvinen@...ux.intel.com>, Serge Semin <fancer.lancer@...il.com>,
 Rengarajan S <rengarajan.s@...rochip.com>, Wolfram Sang
 <wsa+renesas@...g-engineering.com>, Sebastian Andrzej Siewior
 <bigeasy@...utronix.de>
Subject: Re: [PATCH next v1 1/2] serial: 8250: Switch to nbcon console

On Fri, Sep 06 2024 at 18:44, John Ogness wrote:
> So there are 2 things _not_ supported by the write_atomic() callback:
>
> 1. RS485 mode. This is due to the need to start up TX for the
> write, which can lead to:
>
> up->rs485_start_tx()
>   serial8250_em485_start_tx()
>     serial8250_stop_rx()
>       serial8250_rpm_get()
>         pm_runtime_get_sync()
>           __pm_runtime_resume()
>             spin_lock_irqsave()
>
> Taking a spin lock is not safe from NMI and thus disqualifies this call
> chain for write_atomic().

Correct. __pm_runtime_resume() can sleep as well :)

> If UART_CAP_RPM is not set, the pm_runtime_get_sync() is avoided. So I
> could only disable atomic RS485 if UART_CAP_RPM is set. But the OMAP
> variant of the 8250 does set this capability.

Sure, but none of this makes sense. What's so special about that em485
muck that serial8250_stop_rx() needs to do that PM dance?

It writes the IER register, which serial8250_console_write() just wrote
to in serial8250_clear_IER() without doing this PM dance. So for the
console write path this stop part is not required at all.  That said,
serial8250_em485_stop_tx() doesn't have this PM dance either.

I'm 100% that this is just a problem of blindly sharing this with the
regular uart code and not because there is a requirement. See what
serial8250_console_setup() does at the end:

        if (port->dev)
                pm_runtime_get_sync(port->dev);

The corresponding put() is in serial8250_console_exit(). So there is
absolutely zero reason for power management in the console write
functions. It's the usual voodoo programming which nobody noticed
because it did not immediately blow up in their face.

There is another minor issue in that em485 muck. One code path arms a
hrtimer, which does not work from NMI like contexts, but that is only
taken when the transmitter is not empty, so probably a non-issue
because the console write code waits for it to be drained.

There are also a few lockdep_assert_held_once(port->lock) in that code
which will trigger when called from the nbcon write functions. They are
already broken today when oops_in_progress is set and the trylock of
port::lock fails...

So splitting this up into a clean and lean set for the console write
functions will make all these horrors just go away. The current sharing
is just fragile as hell and makes no sense at all.

> 2. Modem control. This is due to waiting for inputs, which can lead to:
>
> serial8250_modem_status()
>   wake_up_interruptible()
>
> Performing wakes is not safe from scheduler or NMI and thus disqualifies
> this call chain for write_atomic().
>
> It would probably be acceptable to move serial8250_modem_status() into
> an irq_work.

Yes, but serial8250_modem_status() has more problems than that:

See uart_handle_dcd_change() and uart_handle_cts_change(). They call
into the tty layer and do their own wakeups.

So no, serial8250_modem_status() cannot be invoked there at all.

You have to defer this whole status dance to irq work and this really
needs to be done inside the write_atomic() callback. Otherwise a status
change could get lost, which is bad in non-panic situations.

That needs a bit of thought vs. port->msr_saved_flags, because in a
hostile takeover situation that needs to take into account that the
interrupted context might be fiddling with msr_saved_flags too, which
might on resume overwrite the write_atomic() modifications due to RMW.
Shouldn't be hard.

That brings me to that USE_SERIAL_8250_LEGACY_CONSOLE #ifdeffery, which
started this conversation.

The nbcon conversion does not make things worse than they are today. Any
problem which happens in the atomic_write() callback has existed before
already. So just get rid of the legacy code and be done with it.

At some point you have to bite the bullet and deal with the fallout when
it's reported. Remember, perfect is the enemy of good and you will never
reach perfect.

Thanks,

        tglx

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ