lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20181019152307.62t6al6ney5ofo36@wunner.de>
Date:   Fri, 19 Oct 2018 17:23:07 +0200
From:   Lukas Wunner <lukas@...ner.de>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     linux-kernel@...r.kernel.org,
        Mathias Duckeck <m.duckeck@...bus.de>,
        Akshay Bhat <akshay.bhat@...esys.com>,
        Casey Fitzpatrick <casey.fitzpatrick@...esys.com>
Subject: Re: [PATCH] genirq: Fix race on spurious interrupt detection

On Fri, Oct 19, 2018 at 04:31:30PM +0200, Thomas Gleixner wrote:
> On Thu, 18 Oct 2018, Lukas Wunner wrote:
> > Commit 1e77d0a1ed74 ("genirq: Sanitize spurious interrupt detection of
> > threaded irqs") made detection of spurious interrupts work for threaded
> > handlers by:
> > 
> > a) incrementing a counter every time the thread returns IRQ_HANDLED, and
> > b) checking whether that counter has increased every time the thread is
> >    woken.
> > 
> > However for oneshot interrupts, the commit unmasks the interrupt before
> > incrementing the counter.  If another interrupt occurs right after
> > unmasking but before the counter is incremented, that interrupt is
> > incorrectly considered spurious:
> > 
> > time
> >  |  irq_thread()
> >  |    irq_thread_fn()
> >  |      action->thread_fn()
> >  |      irq_finalize_oneshot()
> >  |        unmask_threaded_irq()            /* interrupt is unmasked */
> >  |
> >  |                  /* interrupt fires, incorrectly deemed spurious */
> >  |
> >  |    atomic_inc(&desc->threads_handled); /* counter is incremented */
> >  v
> > 
> > I am seeing this with a hi3110 CAN controller receiving data at high
> > volume (from a separate machine sending with "cangen -g 0 -i -x"):
> > The controller signals a huge number of interrupts (hundreds of millions
> > per day) and every second there are about a dozen which are deemed
> > spurious.  The issue is benign in this case, mostly just an irritation,
> > but I'm worrying that at high CPU load and in the presence of higher
> > priority tasks, the number of incorrectly detected spurious interrupts
> > might increase beyond the 99,900 threshold and cause disablement of the
> > IRQ.
> 
> I doubt that this can happen in reality, so I'd rather reword that
> paragraph slightly:
> 
>   In theory high CPU load and in the presence of higher priority tasks, the
>   number of incorrectly detected spurious interrupts might increase beyond
>   the 99,900 threshold and cause disablement of the interrupt.
> 
>   In practice it just increments the spurious interrupt count. But that can
>   cause people to waste time investigating it over and over.
> 
> Hmm?

Sure, fine by me.  Would you prefer me to resend with that change or
can you fold it in when applying?

FWIW I did manage to reach the 99,900 threshold once because I had
added copious amounts of printk() to the hi3110 IRQ thread to debug
another issue.  But I never experienced that without those printk()'s.
Here's the resulting splat:

irq 194: nobody cared (try booting with the "irqpoll" option)
CPU: 0 PID: 1929 Comm: candump Tainted: G           O    4.9.76-rt60-v7+ #1
Hardware name: BCM2835
[<8011106c>] (unwind_backtrace) from [<8010cdd8>] (show_stack+0x20/0x24)
[<8010cdd8>] (show_stack) from [<8047cb2c>] (dump_stack+0xc8/0x10c)
[<8047cb2c>] (dump_stack) from [<8018192c>] (__report_bad_irq+0x3c/0xdc)
[<8018192c>] (__report_bad_irq) from [<80181d94>] (note_interrupt+0x29c/0x2ec)
[<80181d94>] (note_interrupt) from [<8017ec9c>] (handle_irq_event_percpu+0x78/0x84)
[<8017ec9c>] (handle_irq_event_percpu) from [<8017ed20>] (handle_irq_event+0x78/0xbc)
[<8017ed20>] (handle_irq_event) from [<80182ad8>] (handle_edge_irq+0x13c/0x1e8)
[<80182ad8>] (handle_edge_irq) from [<8017db64>] (generic_handle_irq+0x34/0x44)
[<8017db64>] (generic_handle_irq) from [<804ae1a0>] (bcm2835_gpio_irq_handle_bank+0x88/0xac)
[<804ae1a0>] (bcm2835_gpio_irq_handle_bank) from [<804ae2ac>] (bcm2835_gpio_irq_handler+0xe8/0x154)
[<804ae2ac>] (bcm2835_gpio_irq_handler) from [<8017db64>] (generic_handle_irq+0x34/0x44)
[<8017db64>] (generic_handle_irq) from [<804a7720>] (bcm2836_chained_handle_irq+0x38/0x50)
[<804a7720>] (bcm2836_chained_handle_irq) from [<8017db64>] (generic_handle_irq+0x34/0x44)
[<8017db64>] (generic_handle_irq) from [<8017e144>] (__handle_domain_irq+0x6c/0xc4)
[<8017e144>] (__handle_domain_irq) from [<8010155c>] (bcm2836_arm_irqchip_handle_irq+0xac/0xb0)
[<8010155c>] (bcm2836_arm_irqchip_handle_irq) from [<80775dec>] (__irq_usr+0x4c/0x60)
Exception stack(0xb6b15fb0 to 0xb6b15ff8)
5fa0:                                     76ec1d50 0000000a 011f8026 fbad2aa4
5fc0: 76ec1d50 0000000a 76f1a000 000263bc 00000000 000263fc 00000001 7ec3f470
5fe0: 00000444 7ec3f2e8 76de5b90 76def528 40000010 ffffffff
handlers:
[<8017edc0>] irq_default_primary_handler threaded [<7f37c734>] hi3110_can_ist [hi311x]
Disabling IRQ #194

Thanks,

Lukas

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ