linux-kernel - Re: can/should a disabled irq become pending?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3793d7e573d57e895f179d7ba90f2b395e1ac135.camel@gmail.com>
Date: Thu, 14 Nov 2024 13:04:58 +0100
From: Nuno Sá <noname.nuno@...il.com>
To: Uwe Kleine-König <u.kleine-koenig@...libre.com>
Cc: Thomas Gleixner <tglx@...utronix.de>, linux-kernel@...r.kernel.org, 
 Jonathan Cameron
	 <jic23@...nel.org>
Subject: Re: can/should a disabled irq become pending?

On Thu, 2024-11-14 at 11:59 +0100, Uwe Kleine-König wrote:
> Hello,
> 
> On Thu, Nov 14, 2024 at 08:49:34AM +0100, Nuno Sá wrote:
> > On Wed, 2024-11-13 at 16:50 +0100, Thomas Gleixner wrote:
> > > On Wed, Nov 13 2024 at 11:34, Nuno Sá wrote:
> > > > On Wed, 2024-11-13 at 04:40 +0100, Thomas Gleixner wrote:
> > > > > The interrupt does not get to the device handler even in the lazy
> > > > > disable case. Once the driver invoked disable_irq*() the low level
> > > > > flow
> > > > > handlers (edge, level ...) mask the interrupt line and marks the
> > > > > interrupt pending. enable_irq() retriggers the interrupt when the
> > > > > pending bit is set, except when the interrupt line is level triggered.
> > > > 
> > > > There's something that I'm still trying to figure... For IRQ controllers
> > > > that not
> > > > disable edge detection, can't we get the device handler called twice if
> > > > we
> > > > don't set
> > > > unlazy?
> > > > 
> > > > irq_enable() - > check_irq_resend()
> > > > 
> > > > and then
> > > > 
> > > > handle_edge_irq() raised by the controller
> > > 
> > > You're right. We should have a flag which controls the replay
> > > requirements of an interrupt controller. So far it only skips for level
> > > triggered interrupts, but for those controllers it should skip for edge
> > > too. Something like IRQCHIP_NO_RESEND ...
> 
> Agreed, if the irq gets pending while disabled in both hardware and
> software, that shouldn't result in two invokations. Is this an issue for
> level irqs only? For edge irqs this only happens with lazy disable and

Resending is already ignore for level...

> if two events happen. Hm, I guess in that case we still only want a single
> invokation of the irq handler?
> 
> > > > Or is the core handling this somehow? I thought IRQS_REPLAY could be
> > > > doing the trick but I'm not seeing how...
> > > 
> > > IRQS_REPLAY is just internal state to avoid double replay.
> > > 
> > > > > On controllers which suffer from the #2 problem UNLAZY should indeed
> > > > > be
> > > > > ignored for edge type interrupts. That's something which the
> > > > > controller
> > > > > should signal via a irqchip flag and the core code can act upon it and
> > > > > ignore UNLAZY for edge type interrupts.
> > > > > 
> > > > > But that won't fix the problem at hand. Let's take a step back and
> > > > > look
> > > > > at the larger picture whether this can be reliably "fixed" at all.
> > > > > 
> > > > 
> > > > Yeah, I'm still trying to figure when it's correct for a device to do
> > > > UNLAZY? If I'm
> > > > understanding things, devices that rely on disable_irq*() should set
> > > > it?
> > > 
> > > Not necessarily. In most cases devices are not re-raising interrupts
> > > before the previous one has been handled and acknowledged in the device.
> 
> Usage of UNLAZY should never affect correctness. It's "only" a
> performance optimisation which has a positive effect if it's expected
> that an irq event happens while it's masked.
> 
> > > > Because problem #2 is something that needs to be handled at the
> > > > controller and core level if I got you right.
> > > 
> > > Yes. We need a irqchip flag for that.
> > > 
> > > > > > Ack. If there is no way to read back the line state and it's unknown
> > > > > > if
> > > > > > the irq controller suffers from problem #2, the only way to still
> > > > > > benefit from the irq is to not use IRQ_DISABLE_UNLAZY and only act
> > > > > > on
> > > > > > each 2nd irq; or ignore irqs based on timing. That doesn't sound
> > > > > > very
> > > > > > robust though, so maybe the driver has to fall back on polling the
> > > > > > status register and not use irqs at all in that case.
> > > > > 
> > > > > Actually ignoring the first interrupt after a SPI transfer and waiting
> > > > > for the next conversion to raise the interrupt again should be robust
> > > > > enough. The ADC has to be in continous conversion mode for that
> > > > > obviously.
> > > > > 
> > > > Might be the only sane option we have, Uwe? If we do this, we could be
> > > > dropping valid samples but only with controllers that suffer from
> > > > #2.
> > > 
> > > No. You have the same problem with the controllers which do not disable
> > > the edge detection logic.
> > > 
> > > The interrupt controller raises the interrupt on unmask (enable_irq()).
> > > Depending on timing the device handler might be invoked _before_ the
> > > sample is ready, no?
> > 
> > For those controllers, I think it's almost always guaranteed that the first
> > IRQ
> > after enable is not really a valid sample. We'll always have some SPI
> > transfer
> > (that should latch an IRQ on the controller) before enable_irq().
> 
> The first irq isn't a valid sample unless the driver is preempted
> between the spi transfer and the following enable_irq() such that the
> irq event triggered by the SPI transfer doesn't result in calling the
> irq handler before the sample is ready. I guess that's what you ruled

I guess that race we could prevent by disabling IRQs...

> out by saying "almost always"? I'd recommend to not rely on that. Chips
> become faster (and so conversion time shorter) which widens the race
> window and if you become unsynchronized and ignore every wrong second
> irq all samples become bogous.

Right now we set UNLAZY and that brings this difference in behavior depending on
the IRQ controller we have. But if we remove that change and make sure there can
be no race between enable_irq() and the last spi_transfer, it should be safe to
assume the first time we get in the handler is not for a valid sample. Not sure
synchronization could be an issue to the point where you ignore all samples. If
you ignore one IRQ, then the next one needs to be a valid sample (as there
should be no spi_transfer in between). But not sure if it can affect
performance...

I think right now, unless the IRQ controller suffers from #2, every time we get
in the device handler after enable_irq() is not because of DRDY and having a
valid sample or not is pure luck. 

> 
> So I still think the extra GPIO read should be implemented (as I
> proposed in
> https://lore.kernel.org/linux-iio/20241028160748.489596-9-u.kleine-koenig@baylibre.com/
> )
> to guarantee reliable operation. If that isn't possible the only really
> robust way to operate is using polling.

My only issue with the gpio approach and your conversation with Thomas seems to
prove it is that we're not guaranteed to be able to read the line. I guess your
reasoning is that if we can't do that for a platform, then don't give the gpio
in DT? But in that case, are we left with a device that might or might not work?

"Funny stuff"...

- Nuno Sá