linux-kernel - Re: can/should a disabled irq become pending?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5pcuve6oz7z5fammiu44e7nldu45gcupjh7vlape4k4ctzrvsf@lpaz2owcag7f>
Date: Thu, 14 Nov 2024 11:59:54 +0100
From: Uwe Kleine-König <u.kleine-koenig@...libre.com>
To: Nuno Sá <noname.nuno@...il.com>
Cc: Thomas Gleixner <tglx@...utronix.de>, linux-kernel@...r.kernel.org, 
	Jonathan Cameron <jic23@...nel.org>
Subject: Re: can/should a disabled irq become pending?

Hello,

On Thu, Nov 14, 2024 at 08:49:34AM +0100, Nuno Sá wrote:
> On Wed, 2024-11-13 at 16:50 +0100, Thomas Gleixner wrote:
> > On Wed, Nov 13 2024 at 11:34, Nuno Sá wrote:
> > > On Wed, 2024-11-13 at 04:40 +0100, Thomas Gleixner wrote:
> > > > The interrupt does not get to the device handler even in the lazy
> > > > disable case. Once the driver invoked disable_irq*() the low level flow
> > > > handlers (edge, level ...) mask the interrupt line and marks the
> > > > interrupt pending. enable_irq() retriggers the interrupt when the
> > > > pending bit is set, except when the interrupt line is level triggered.
> > > 
> > > There's something that I'm still trying to figure... For IRQ controllers
> > > that not
> > > disable edge detection, can't we get the device handler called twice if we
> > > don't set
> > > unlazy?
> > > 
> > > irq_enable() - > check_irq_resend()
> > > 
> > > and then
> > > 
> > > handle_edge_irq() raised by the controller
> > 
> > You're right. We should have a flag which controls the replay
> > requirements of an interrupt controller. So far it only skips for level
> > triggered interrupts, but for those controllers it should skip for edge
> > too. Something like IRQCHIP_NO_RESEND ...

Agreed, if the irq gets pending while disabled in both hardware and
software, that shouldn't result in two invokations. Is this an issue for
level irqs only? For edge irqs this only happens with lazy disable and
if two events happen. Hm, I guess in that case we still only want a single
invokation of the irq handler?

> > > Or is the core handling this somehow? I thought IRQS_REPLAY could be
> > > doing the trick but I'm not seeing how...
> > 
> > IRQS_REPLAY is just internal state to avoid double replay.
> > 
> > > > On controllers which suffer from the #2 problem UNLAZY should indeed be
> > > > ignored for edge type interrupts. That's something which the controller
> > > > should signal via a irqchip flag and the core code can act upon it and
> > > > ignore UNLAZY for edge type interrupts.
> > > > 
> > > > But that won't fix the problem at hand. Let's take a step back and look
> > > > at the larger picture whether this can be reliably "fixed" at all.
> > > > 
> > > 
> > > Yeah, I'm still trying to figure when it's correct for a device to do
> > > UNLAZY? If I'm
> > > understanding things, devices that rely on disable_irq*() should set
> > > it?
> > 
> > Not necessarily. In most cases devices are not re-raising interrupts
> > before the previous one has been handled and acknowledged in the device.

Usage of UNLAZY should never affect correctness. It's "only" a
performance optimisation which has a positive effect if it's expected
that an irq event happens while it's masked.

> > > Because problem #2 is something that needs to be handled at the
> > > controller and core level if I got you right.
> > 
> > Yes. We need a irqchip flag for that.
> > 
> > > > > Ack. If there is no way to read back the line state and it's unknown if
> > > > > the irq controller suffers from problem #2, the only way to still
> > > > > benefit from the irq is to not use IRQ_DISABLE_UNLAZY and only act on
> > > > > each 2nd irq; or ignore irqs based on timing. That doesn't sound very
> > > > > robust though, so maybe the driver has to fall back on polling the
> > > > > status register and not use irqs at all in that case.
> > > > 
> > > > Actually ignoring the first interrupt after a SPI transfer and waiting
> > > > for the next conversion to raise the interrupt again should be robust
> > > > enough. The ADC has to be in continous conversion mode for that
> > > > obviously.
> > > > 
> > > Might be the only sane option we have, Uwe? If we do this, we could be
> > > dropping valid samples but only with controllers that suffer from
> > > #2.
> > 
> > No. You have the same problem with the controllers which do not disable
> > the edge detection logic.
> > 
> > The interrupt controller raises the interrupt on unmask (enable_irq()).
> > Depending on timing the device handler might be invoked _before_ the
> > sample is ready, no?
> 
> For those controllers, I think it's almost always guaranteed that the first IRQ
> after enable is not really a valid sample. We'll always have some SPI transfer
> (that should latch an IRQ on the controller) before enable_irq().

The first irq isn't a valid sample unless the driver is preempted
between the spi transfer and the following enable_irq() such that the
irq event triggered by the SPI transfer doesn't result in calling the
irq handler before the sample is ready. I guess that's what you ruled
out by saying "almost always"? I'd recommend to not rely on that. Chips
become faster (and so conversion time shorter) which widens the race
window and if you become unsynchronized and ignore every wrong second
irq all samples become bogous.

So I still think the extra GPIO read should be implemented (as I
proposed in
https://lore.kernel.org/linux-iio/20241028160748.489596-9-u.kleine-koenig@baylibre.com/)
to guarantee reliable operation. If that isn't possible the only really
robust way to operate is using polling.

Best regards
Uwe

Download attachment "signature.asc" of type "application/pgp-signature" (489 bytes)