linux-kernel - Re: [RESEND RFC PATCH v1 1/2] irq/spurious: Reset irqs_unhandled if an irq

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZahYxOL2r7YbPvO7@LeoBras>
Date: Wed, 17 Jan 2024 19:46:28 -0300
From: Leonardo Bras <leobras@...hat.com>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: Leonardo Bras <leobras@...hat.com>,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	Jiri Slaby <jirislaby@...nel.org>,
	Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>,
	Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
	Florian Fainelli <f.fainelli@...il.com>,
	John Ogness <john.ogness@...utronix.de>,
	Tony Lindgren <tony@...mide.com>,
	Marcelo Tosatti <mtosatti@...hat.com>,
	linux-kernel@...r.kernel.org,
	linux-serial@...r.kernel.org,
	Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Subject: Re: [RESEND RFC PATCH v1 1/2] irq/spurious: Reset irqs_unhandled if an irq_thread handles one IRQ request

On Wed, Jan 17, 2024 at 11:08:44PM +0100, Thomas Gleixner wrote:
> On Tue, Jan 16 2024 at 04:36, Leonardo Bras wrote:
> > This IRQ line disable bug can be easily reproduced with a serial8250
> > console on a PREEMPT_RT kernel: it only takes the user to print a lot
> > of text to the console (or to ttyS0): around 300k chars should be
> > enough.
> 
> That has nothing to do with RT, it's a problem of force threaded
> interrupts in combination with an edge type interrupt line and a
> hardware which keeps firing interrupts forever.

Hello Thomas, thanks for your feedback!

I agreed it has nothing to do with RT.
I just mentioned PREEMPT_RT as my test case scenario, since it enables 
force-threaded IRQs.

> 
> > To fix this bug, reset irqs_unhandled whenever irq_thread handles at least
> > one IRQ request.
> 
> This papers over the symptom and makes runaway detection way weaker for
> all interrupts or breaks it completely.

This change is supposed to only touch threaded interruptions, since it will
reach the included line only if (action_ret == IRQ_WAKE_THREAD) and if 
desc->threads_handled changes since the last IRQ request.

This incrementing also happens only on irq_forced_thread_fn() and 
irq_thread_fn(), which are called only from irq_thread_fn().

But I get the overall worry about having this making runaway detection way 
weaker for all threaded interrupts.

I have previously worked on a solution that can be more precise and be an 
opt-in for drivers instead of a general solution:

It required a change in IRQ interface that let the handlers inform how 
many IRQs were actually handled (batching). This number would then be 
added to desc->threads_handle (in irq_*thread_fn(), just changing the 
atomic_inc() to atomic_add()), and then subtracted from irqs_unhandled
at note_interrupt().

In the serial8250 case, the driver would be changed to use that interface, 
since it's already able to process multiple IRQs, and the bug just 
vanishes.

This also solved the serial driver issue, but required a deeper change in 
the code, which caused me to consider a simpler solution first.

This solution sure does give better runnaway detection. Do you think it 
would be better that the one I sent in this patch?

> 
> The problem with edge type interrupts is that we cannot mask them like
> we do with level type interrupts in the hard interrupt handler and
> unmask them once the threaded handler finishes.
> 
> So yes, we need special rules here when:
> 
>    1) The interrupt handler is force threaded
> 
>    2) The interrupt line is edge type
> 
>    3) The accumulated unhandled interrupts are within a sane margin
> 
> Thanks,
> 
>         tglx
> 

Completelly agree, that's why I am suggesting dealing with threaded 
interruptions in a different way: reseting the unhandled count when it 
handles a request. 

I am not sure how force threaded and just threaded are different in this 
scenario. Could you help me understand?

Thanks!
Leo