linux-kernel - [RFC PATCH v1 0/2] Fix serial console for PREEMPT

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20240116073234.2355850-2-leobras@redhat.com>
Date: Tue, 16 Jan 2024 04:32:31 -0300
From: Leonardo Bras <leobras@...hat.com>
To: Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	Jiri Slaby <jirislaby@...nel.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>,
	Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
	Florian Fainelli <f.fainelli@...il.com>,
	John Ogness <john.ogness@...utronix.de>,
	Tony Lindgren <tony@...mide.com>,
	Marcelo Tosatti <mtosatti@...hat.com>
Cc: Leonardo Bras <leobras@...hat.com>,
	linux-kernel@...r.kernel.org,
	linux-serial@...r.kernel.org
Subject: [RFC PATCH v1 0/2] Fix serial console for PREEMPT_RT

While dealing with a bug that breaks the serial console from qemu (8250)
after printing a lot of data, I found some issues on this driver on RT
as well as spurious IRQ behaviors that don't seem to be adeqate for RT.

Comments:
Patch #1:
I found out this driver get an IRQ request for every tx byte, but the
handler is able to deal with sending multiple bytes per "thread wake up". 

Since the irqs_unhandled keep growing, and theads_handled don't change
as often, after some intense load (tx ~300kBytes) the serial will 
disable the IRQ line for this driver, which ultimately breaks the console.

My fist solution kept track of how many requests given handler dealt with,
which got added to theads_handled. On note_interrupt I got the diff from
theads_handled_last and subtracted that diff from irqs_unhandled.

This solution required a change in the irqreturn_t typedef and a bunch of
helpers and defines, as well as adapting the 8250 driver. 
At the end seemed like a overcomplicated solution for the issue, but it
can be an alternative if the current solution is considered imprecise.

Mu cyrrent solution on patch #1 is much simpler, just keeping the
IRQ enabled as long as the irq_thread deal with any IRQ request before
irqs_unhandled hitting the limit value.

Patch #2:
In RT, the 8250 driver has an issue if it's interrupted while holding the
port->lock. If the interruption needs to printk() anything, it
will try to get the port->lock, which is busy, so spin_lock() will try
to reschedule the interruption, which is in atomic context, and will 
trigger a bug.

This bug reproduces quite often, like in 50% of tests I did. 

The only thing I could think of for fixing this is using in_atomic()
when PREEMPT_RT=y, so it makes use of the same mechanism as for
oops_in_progress to avoid getting the lock if it's busy. It's working
just fine.

Yeah, I got the warning in checkpath:
"ERROR: do not use in_atomic in drivers"

So I need some feedback on what to do to avoid this bug, if not 
by using in_atomic() at this driver.

Since this one is linked to the console, any printk will try to get
this drivers port->lock, and so it's kind of hard to avoid this accesses.

I though on doing an interface for spin_lock_only_if_can_sleep() but
it seemed overkill.

Please provide comments / feedback.

Thanks!
Leo

Leonardo Bras (2):
  irq/spurious: Reset irqs_unhandled if an irq_thread handles one IRQ
    request
  serial/8250: Avoid getting lock in RT atomic context

 drivers/tty/serial/8250/8250_port.c | 2 +-
 kernel/irq/spurious.c               | 8 ++++++++
 2 files changed, 9 insertions(+), 1 deletion(-)

base-commit: 052d534373b7ed33712a63d5e17b2b6cdbce84fd
-- 
2.43.0