lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 07 Feb 2020 16:11:59 +0000
From:   Marc Zyngier <maz@...nel.org>
To:     Lukas Wunner <lukas@...ner.de>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Jason Cooper <jason@...edaemon.net>,
        Nicolas Saenz Julienne <nsaenzjulienne@...e.de>,
        Florian Fainelli <f.fainelli@...il.com>,
        Ray Jui <rjui@...adcom.com>,
        Scott Branden <sbranden@...adcom.com>,
        bcm-kernel-feedback-list@...adcom.com,
        linux-kernel@...r.kernel.org, linux-rpi-kernel@...ts.infradead.org,
        linux-arm-kernel@...ts.infradead.org,
        Serge Schneider <serge@...pberrypi.org>,
        Kristina Brooks <notstina@...il.com>,
        Stefan Wahren <wahrenst@....net>,
        Matthias Brugger <mbrugger@...e.com>,
        Martin Sperl <kernel@...tin.sperl.org>,
        Phil Elwell <phil@...pberrypi.org>
Subject: Re: [PATCH] irqchip/bcm2835: Quiesce IRQs left enabled by bootloader

Hi Lukas,

On 2020-02-07 15:46, Lukas Wunner wrote:
> Customers of our "Revolution Pi" open source PLCs (which are based on
> the Raspberry Pi) have reported random lockups as well as jittery eMMC,
> UART and SPI latency.  We were able to reproduce the lockups in our lab
> and hooked up a JTAG debugger:
> 
> It turns out that the USB controller's interrupt is already enabled 
> when
> the kernel boots.  All interrupts are disabled when the chip comes out
> of power-on reset, according to the spec.  So apparently the bootloader
> enables the interrupt but neglects to disable it before handing over
> control to the kernel.
> 
> The bootloader is a closed source blob provided by the Raspberry Pi
> Foundation.  Development of an alternative open source bootloader was
> begun by Kristina Brooks but it's not fully functional yet.  Usage of
> the blob is thus without alternative for the time being.
> 
> The Raspberry Pi Foundation's downstream kernel has a performance-
> optimized USB driver (which we use on our Revolution Pi products).
> The driver takes advantage of the FIQ fast interrupt.  Because the
> regular USB interrupt was left enabled by the bootloader, both the
> FIQ and the normal interrupt is enabled once the USB driver probes.
> 
> The spec has the following to say on simultaneously enabling the FIQ
> and the normal interrupt of a peripheral:
> 
> "One interrupt source can be selected to be connected to the ARM FIQ
>  input.  An interrupt which is selected as FIQ should have its normal
>  interrupt enable bit cleared.  Otherwise a normal and an FIQ interrupt
>  will be fired at the same time.  Not a good idea!"

Or to spell it out more clearly: Braindead hardware. Really.

>                                   ^^^^^^^^^^^^^^^
> https://www.raspberrypi.org/app/uploads/2012/02/BCM2835-ARM-Peripherals.pdf
> page 110
> 
> On a multicore Raspberry Pi, the Foundation's kernel routes all normal
> interrupts to CPU 0 and the FIQ to CPU 1.  Because both the FIQ and the
> normal interrupt is enabled, a USB interrupt causes CPU 0 to spin in
> bcm2836_chained_handle_irq() until the FIQ on CPU 1 has cleared it.
> Interrupts with a lower priority than USB are starved as long.
> 
> That explains the jittery eMMC, UART and SPI latency:  On one occasion
> I've seen CPU 0 blocked for no less than 2.9 msec.  Basically,
> everything not USB takes a performance hit:  Whereas eMMC throughput
> on a Compute Module 3 remains relatively constant at 23.5 MB/s with
> this commit, it irregularly dips to 23.0 MB/s without this commit.
> 
> The lockups occur when CPU 0 receives a USB interrupt while holding a
> lock which CPU 1 is trying to acquire while the FIQ is temporarily
> disabled on CPU 1.
> 
> I've tested old releases of the Foundation's bootloader as far back as
> 1.20160202-1 and they all leave the USB interrupt enabled.  Still older
> releases fail to boot a contemporary kernel on a Compute Module 1 or 3,
> which are the only Raspberry Pi variants I have at my disposal for
> testing.
> 
> Fix by disabling IRQs left enabled by the bootloader.  Although the
> impact is most pronounced on the Foundation's downstream kernel,
> it seems prudent to apply the fix to the upstream kernel to guard
> against such mistakes in any present and future bootloader.
> 
> An alternative, though more convoluted approach would be to clear the
> IRQD_IRQ_MASKED flag on all interrupts left enabled on boot.  Then the
> first invocation of handle_level_irq() would mask and thereby quiesce
> those interrupts.

Nah, that's terrible. The right thing to do is indeed to mop up the mess
that the bootloader is bound to leave and start with a clean slate.

> 
> Signed-off-by: Lukas Wunner <lukas@...ner.de>
> Cc: Serge Schneider <serge@...pberrypi.org>
> Cc: Kristina Brooks <notstina@...il.com>
> Cc: stable@...r.kernel.org
> ---
>  drivers/irqchip/irq-bcm2835.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/irqchip/irq-bcm2835.c 
> b/drivers/irqchip/irq-bcm2835.c
> index 418245d31921..0d9a5a7ebe2c 100644
> --- a/drivers/irqchip/irq-bcm2835.c
> +++ b/drivers/irqchip/irq-bcm2835.c
> @@ -150,6 +150,13 @@ static int __init armctrl_of_init(struct 
> device_node *node,
>  		intc.enable[b] = base + reg_enable[b];
>  		intc.disable[b] = base + reg_disable[b];
> 
> +		irq = readl(intc.enable[b]);

readl_relaxed(), please. irq is not quite the right type either, please 
use a u32.

> +		if (irq) {
> +			writel(irq, intc.disable[b]);

writel_relaxed().

> +			pr_err(FW_BUG "Bootloader left irq enabled: "
> +			       "bank %d irq %*pbl\n", b, IRQS_PER_BANK, &irq);
> +		}
> +
>  		for (i = 0; i < bank_irqs[b]; i++) {
>  			irq = irq_create_mapping(intc.domain, MAKE_HWIRQ(b, i));
>  			BUG_ON(irq <= 0);

Don't you need to do something about the FIQ side as well?

         M.
-- 
Jazz is not dead. It just smells funny...

Powered by blists - more mailing lists