linux-kernel - Re: [PATCH] genirq: Fix race on spurious interrupt detection

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.21.1810191554030.6075@nanos.tec.linutronix.de>
Date:   Fri, 19 Oct 2018 16:31:30 +0200 (CEST)
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Lukas Wunner <lukas@...ner.de>
cc:     linux-kernel@...r.kernel.org,
        Mathias Duckeck <m.duckeck@...bus.de>,
        Akshay Bhat <akshay.bhat@...esys.com>,
        Casey Fitzpatrick <casey.fitzpatrick@...esys.com>
Subject: Re: [PATCH] genirq: Fix race on spurious interrupt detection

On Thu, 18 Oct 2018, Lukas Wunner wrote:
> Commit 1e77d0a1ed74 ("genirq: Sanitize spurious interrupt detection of
> threaded irqs") made detection of spurious interrupts work for threaded
> handlers by:
> 
> a) incrementing a counter every time the thread returns IRQ_HANDLED, and
> b) checking whether that counter has increased every time the thread is
>    woken.
> 
> However for oneshot interrupts, the commit unmasks the interrupt before
> incrementing the counter.  If another interrupt occurs right after
> unmasking but before the counter is incremented, that interrupt is
> incorrectly considered spurious:
> 
> time
>  |  irq_thread()
>  |    irq_thread_fn()
>  |      action->thread_fn()
>  |      irq_finalize_oneshot()
>  |        unmask_threaded_irq()            /* interrupt is unmasked */
>  |
>  |                  /* interrupt fires, incorrectly deemed spurious */
>  |
>  |    atomic_inc(&desc->threads_handled); /* counter is incremented */
>  v
> 
> I am seeing this with a hi3110 CAN controller receiving data at high
> volume (from a separate machine sending with "cangen -g 0 -i -x"):
> The controller signals a huge number of interrupts (hundreds of millions
> per day) and every second there are about a dozen which are deemed
> spurious.  The issue is benign in this case, mostly just an irritation,
> but I'm worrying that at high CPU load and in the presence of higher
> priority tasks, the number of incorrectly detected spurious interrupts
> might increase beyond the 99,900 threshold and cause disablement of the
> IRQ.

I doubt that this can happen in reality, so I'd rather reword that
paragraph slightly:

  In theory high CPU load and in the presence of higher priority tasks, the
  number of incorrectly detected spurious interrupts might increase beyond
  the 99,900 threshold and cause disablement of the interrupt.

  In practice it just increments the spurious interrupt count. But that can
  cause people to waste time investigating it over and over.

Hmm?

Thanks,

	tglx