linux-kernel - Re: [RFC PATCH v2 3/4] irq: Introduce IRQ_HANDLED

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZzVxIfb5KpL97P4Q@LeoBras>
Date: Thu, 14 Nov 2024 00:40:17 -0300
From: Leonardo Bras <leobras@...hat.com>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: Leonardo Bras <leobras@...hat.com>,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	Jiri Slaby <jirislaby@...nel.org>,
	Tony Lindgren <tony@...mide.com>,
	Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
	John Ogness <john.ogness@...utronix.de>,
	Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>,
	Uwe Kleine-König <u.kleine-koenig@...gutronix.de>,
	Florian Fainelli <florian.fainelli@...adcom.com>,
	Shanker Donthineni <sdonthineni@...dia.com>,
	linux-kernel@...r.kernel.org,
	linux-serial@...r.kernel.org
Subject: Re: [RFC PATCH v2 3/4] irq: Introduce IRQ_HANDLED_MANY

On Fri, Feb 23, 2024 at 01:37:39AM -0300, Leonardo Bras wrote:
> On Wed, Feb 21, 2024 at 04:41:20PM +0100, Thomas Gleixner wrote:
> > On Wed, Feb 21 2024 at 02:39, Leonardo Bras wrote:
> > > On Mon, Feb 19, 2024 at 12:03:07PM +0100, Thomas Gleixner wrote:
> > >> >> Is scenarios where there is no need to keep track of IRQ handled, convert
> > >> >> it back to IRQ_HANDLED.
> > >> >
> > >> > That's not really workable as you'd have to update tons of drivers just
> > >> > to deal with that corner case. That's error prone and just extra
> > >> > complexity all over the place.
> > >
> > > I agree, that's a downside of this implementation. 
> > 
> > A serious one which is not really workable. See below.
> > 
> > > I agree the above may be able to solve the issue, but it would make 2 extra 
> > > atomic ops necessary in the thread handling the IRQ, as well as one extra 
> > > atomic operation in note_interrupt(), which could increase latency on this 
> > > IRQ deferring the handler to a thread.
> > >
> > > I mean, yes, the cpu running note_interrupt() would probably already have 
> > > exclusiveness for this cacheline, but it further increases cacheline 
> > > bouncing and also adds the mem barriers that incur on atomic operations, 
> > > even if we use an extra bit from threads_handled instead of allocate a new 
> > > field for threads_running.
> > 
> > I think that's a strawman. Atomic operations can of course be more
> > expensive than non-atomic ones, but they only start to make a difference
> > when the cache line is contended. That's not the case here for the
> > normal operations.
> > 
> > Interrupts and their threads are strictly targeted to a single CPU and
> > the cache line is already hot and had to be made exclusive because of
> > other write operations to it.
> > 
> > There is usually no concurrency at all, except for administrative
> > operations like enable/disable or affinity changes. Those administrative
> > operations are not high frequency and the resulting cache line bouncing
> > is unavoidable even without that change. But does it matter in the
> > larger picture? I don't think so.
> 
> That's a fair point, but there are some use cases that use CPU Isolation on 
> top of PREEMPT_RT in order to reduce interference on a CPU running an RT 
> workload.
> 
> For those cases, IIRC the handler will run on a different (housekeeping) 
> CPU when those IRQs originate on an Isolated CPU, meaning the above 
> described cacheline bouncing is expected.
> 
> 
> > 
> > > On top of that, let's think on a scenario where the threaded handler will 
> > > solve a lot of requests, but not necessarily spend a lot of time doing so.
> > > This allows the thread to run for little time while solving a lot of 
> > > requests.
> > >
> > > In this scenario, note_interrupt() could return without incrementing 
> > > irqs_unhandled for those IRQ that happen while the brief thread is running, 
> > > but every other IRQ would cause note_interrupt() to increase 
> > > irqs_unhandled, which would cause the bug to still reproduce.
> > 
> > In theory yes. Does it happen in practice?
> > 
> > But that exposes a flaw in the actual detection code. The code is
> > unconditionally accumulating if there is an unhandled interrupt within
> > 100ms after the last unhandled one. IOW, if there is a periodic
> > unhandled one every 50ms, the interrupt will be shut down after 100000 *
> > 50ms = 5000s ~= 83.3m ~= 1.4h. And it neither cares about the number of
> > actually handled interrupts.
> > 
> > The spurious detector is really about runaway interrupts which hog a CPU
> > completely, but the above is not what we want to protect against.
> 
> Now it makes a lot more sense to me.
> Thanks!

Hi Thomas,

I would like to go back to this discussion :)
>From what I could understand, and read back the thread:

- The spurious detector is used to avoid cpu hog when a lots of IRQs are 
  hitting a cpu, but few ( < 100 / 100k) are being handled. It works by
  disabling that interruption.

- The bug I am dealing with (on serial8250), happens to fit exactly at
  above case: lots of requests, but few are handled.
  The reason: threaded handler, many requests, and they are dealt with in 
  batch: multiple requests are handled at once, but a single IRQ_HANDLED 
  returned.

- My proposed solution: Find a way of accounting the requests handled.

  - Implementation: add an option for drivers voluntarily report how 
    many requests they handled. Current drivers need no change.

  - Limitation: If this issue is found on another driver, we need to 
    implement accounting there as well. This may only happen on drivers
    which handle over 1k requests at once.


What was left for me TODO:
Think on a generic solution for this issue, to avoid dealing with that 
in a per-driver basis. 

That's what I was able to think about:

- Only the driver code knows how many requests it handled, so without  
  touching them we can't know how many requests were properly handled.

- I could try thinking a different solution, which involves changing only
  the spurious detector.

  - For that I would need to find a particular characteristic we would want 
    to avoid spurious detection against, and make sure it won't miss an
    actual case we want to be protected about.

Generic solutions(?) proposed:
- Zero irqs_unhandled if threaded & handles a single request in 100k
  - Problem: A regular issue with the interruption would not be detected 
    in the driver. 

- Skip detection if threaded & the handling thread is running
  - Problem 1: the thread may run shortly and batch handle a lot of stuff, 
  not being detected by the spurious detector. 
  - Problem 2: the thread may get stuck, not handle the IRQs and also not
  being detected by the spurious handler. (IIUC)


In the end, I could not find a proper way of telling apart
a - "this is a real spurious IRQ behavior, which needs to be disabled", and 
b - "this is just a handler that batch-handles it's requests",
without touching the drivers' code.

Do you have any suggestion on how to do that?

Thanks!
Leo