lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <nlzd7endwgf46czretmoqlck3fjp5vnvnkv2tsyql632ym5bfo@phr3ggjyx633>
Date: Tue, 12 Nov 2024 23:08:16 +0100
From: Uwe Kleine-König <u.kleine-koenig@...libre.com>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: linux-kernel@...r.kernel.org, 
	Nuno Sá <noname.nuno@...il.com>, Jonathan Cameron <jic23@...nel.org>
Subject: Re: can/should a disabled irq become pending?

Hello Thomas,

On Tue, Nov 12, 2024 at 08:35:13PM +0100, Thomas Gleixner wrote:
> On Mon, Nov 11 2024 at 14:03, Uwe Kleine-König wrote:
> > The ad7124 ADC and a few cousins of it have the peculiarity to report
> > their interrupt on the SPI MISO line. So while the SPI clock is toggling
> > it behaves as MISO and after a transfer it's a falling edge irq
> > signal.
> 
> Saves a pin and the fallout can be handled in software, or not :)
> 
> > So the driver masks the irq during SPI transfers (using irq_disable()).
> > It also uses irq_set_status_flags(sigma_delta->irq_line,
> > IRQ_DISABLE_UNLAZY);
> >
> > Now on some irq controllers (e.g. the rpi GPIO controller) the detection
> > logic is off between calls to irq_disable() and irq_enable() and so on
> > that platform calling irq_enable() makes the irq handler called only on
> > a real irq, while for other irq controllers (e.g. the Altera GPIO
> > controller) the SPI transfers make the irq pending and irq_enable()
> > results in a call of the handler immediately (but very likely before the
> > device actually asserted the interrupt).
> >
> > I think (but please correct me) that the latter behaviour has to be
> > expected and the former is even broken as it might miss irq events.
> 
> The default lazy disabling of interrupts has two reasons:
> 
>     1) Avoid the potentially expensive mask/unmask operations for the
>        common case where no interrupt comes in between mask() and
>        unmask()
> 
>     2) Addressing the problem of edge interrupt controllers, which
>        disable the edge detection logic when masked, which in turn
>        causes interrupts to be lost.

Ack, so far I understood.

> IRQ_DISABLE_UNLAZY solves a different problem. See commit e9849777d0e2
> ("genirq: Add flag to force mask in disable_irq[_nosync]()") See the
> full discussion at:
> 
>   https://lore.kernel.org/all/alpine.DEB.2.11.1510092348370.6097@nanos/T/#m08531ad84aa9a53c26f91fd55be60fad6b5607b9

I found that thread already before. I didn't understand the
motivation/problem fixed there though as the thread start is missing on
lore. I guess the problem was "taking each interrupt twice" as mentioned
in the commit log, but I don't grokk that either. What means "taken"?
The controller's irq routine called; or the driver's handler? Or
something in hardware? I don't see something being done twice.

If I understand correctly, IRQ_DISABLE_UNLAZY is only an optimisation,
but isn't supposed to affect correctness.

> > My conclusions from this are:
> 
> >  - the rpi GPIO controller should be fixed not to honor
> >    IRQ_DISABLE_UNLAZY.
> 
> I don't think that there is something to fix which is specific to the
> RPI GPIO controller. IRQ_DISABLE_UNLAZY is clearly defined and of course
> it will cause the problem #2 for edge type interrupts on such chips.

So a driver making use of IRQ_DISABLE_UNLAZY must know itself if the
irq is provided by such a problematic controller? IMHO that sounds
wrong and the controller (or the irq subsystem) should stick to the lazy
disable approach in that case even though the driver requested to
disable it because not doing the optimisation is better than possibly
missing interrupts.

> Ignoring IRQ_DISABLE_UNLAZY might be the right thing to do for this
> AD7*** use case, but is it universially correct?

If the device driver is agnostic to which irq controller is used (which
it should be) it must be prepared to handle an irq that triggered while
it was masked. So with my understanding (which might be a relevant
restriction) ignoring IRQ_DISABLE_UNLAZY is correct. However that
doesn't mean that drivers (as for example the current ad7124) would
break by such a change as it currently depends on that #2 brokeness.

> It's probably correct to do so for edge type interrupts. For level type,
> not so much.

I'm missing a few details here to follow. E.g. is a level irq supposed
to become pending when the irq line gets only shortly active while the
irq is masked or while irqs are disabled globally?

> >  - the ad7124 driver needs some additional logic to check the actual
> >    line state in the irq handler to differentiate between "real" irqs
> >    and spurious ones triggered by SPI transfers[1];

Do you agree that this is necessary? I created a patch for that and
Jonathan had some doubts this is a valid approach.

> > [1] it can only check the level, not a passed edge, but that seems to
> >     work in practise
> 
> Right, because once the pin is in /RDY mode again it stays low until the
> next update happens.
> 
> But this requires that the chip is connected to a GPIO based interrupt
> line. Non GPIO based interrupt controllers are not providing a line
> state readback.

Ack. If there is no way to read back the line state and it's unknown if
the irq controller suffers from problem #2, the only way to still
benefit from the irq is to not use IRQ_DISABLE_UNLAZY and only act on
each 2nd irq; or ignore irqs based on timing. That doesn't sound very
robust though, so maybe the driver has to fall back on polling the
status register and not use irqs at all in that case.

It's even unknown to me if all GPIO controller support reading the line
if they operate in irq mode.

Best regards
Uwe

Download attachment "signature.asc" of type "application/pgp-signature" (489 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ