[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <11b3de44-522b-4638-987c-2ca132e84936@sifive.com>
Date: Wed, 29 May 2024 17:04:02 -0500
From: Samuel Holland <samuel.holland@...ive.com>
To: Geert Uytterhoeven <geert@...ux-m68k.org>,
Anup Patel <apatel@...tanamicro.com>
Cc: devicetree@...r.kernel.org, Conor Dooley <conor+dt@...nel.org>,
Emil Renner Berthing <kernel@...il.dk>,
Saravana Kannan <saravanak@...gle.com>, Marc Zyngier <maz@...nel.org>,
Anup Patel <anup@...infault.org>, Atish Patra <atishp@...shpatra.org>,
linux-kernel@...r.kernel.org, Björn Töpel
<bjorn@...nel.org>, Rob Herring <robh+dt@...nel.org>,
Palmer Dabbelt <palmer@...belt.com>,
Krzysztof Kozlowski <krzysztof.kozlowski+dt@...aro.org>,
Paul Walmsley <paul.walmsley@...ive.com>,
Thomas Gleixner <tglx@...utronix.de>, Frank Rowand <frowand.list@...il.com>,
linux-riscv@...ts.infradead.org, linux-arm-kernel@...ts.infradead.org,
Andrew Jones <ajones@...tanamicro.com>
Subject: Re: [PATCH v14 01/18] irqchip/sifive-plic: Convert PLIC driver into a
platform driver
Hi Geert,
On 2024-05-29 9:22 AM, Geert Uytterhoeven wrote:
> Hi Anup,
>
> On Thu, Feb 22, 2024 at 10:41 AM Anup Patel <apatel@...tanamicro.com> wrote:
>> The PLIC driver does not require very early initialization so convert
>> it into a platform driver.
>>
>> After conversion, the PLIC driver is probed after CPUs are brought-up
>> so setup cpuhp state after context handler of all online CPUs are
>> initialized otherwise PLIC driver crashes for platforms with multiple
>> PLIC instances.
>>
>> Signed-off-by: Anup Patel <apatel@...tanamicro.com>
>
> Thanks for your patch, which is now commit 8ec99b033147ef3b
> ("irqchip/sifive-plic: Convert PLIC driver into a platform
> driver") in v6.9.
>
> It looks like this conversion is causing issues on BeagleV Starlight
> Beta. After updating esmil/visionfive to v6.10-rc1, the kernel usually
> fails to boot. Adding "earlycon keep_bootcon" reveals these differences:
>
> -riscv-plic c000000.interrupt-controller: mapped 133 interrupts with 2
> handlers for 4 contexts.
> +------------[ cut here ]------------
> +WARNING: CPU: 0 PID: 1 at drivers/irqchip/irq-sifive-plic.c:373
> plic_handle_irq+0xf2/0xf6
> +Modules linked in:
> +CPU: 0 PID: 1 Comm: swapper/0 Not tainted
> 6.10.0-rc1-starlight-02342-g0ba4c76ca0e8-dirty #323
> +Hardware name: BeagleV Starlight Beta (DT)
> +epc : plic_handle_irq+0xf2/0xf6
> + ra : generic_handle_domain_irq+0x1c/0x2a
> +epc : ffffffff8033f994 ra : ffffffff8006319a sp : ffffffc800003f50
> + gp : ffffffff812d63f0 tp : ffffffd8800b8000 t0 : 0000000000000040
> + t1 : 0000000000000000 t2 : 0000000000001000 s0 : ffffffc800003fa0
> + s1 : 0000000000000009 a0 : ffffffd880183600 a1 : 0000000000000009
> + a2 : 0000000000000000 a3 : 0000000000000000 a4 : 0000000000000000
> + a5 : 0000000000000000 a6 : ffffffd880400248 a7 : ffffffd8804002b8
> + s2 : ffffffd9f8fac458 s3 : 0000000000000004 s4 : 0000000000000000
> + s5 : ffffffff81293f58 s6 : ffffffd88014ac00 s7 : 0000000000000004
> + s8 : ffffffc800013b2c s9 : ffffffc800013b34 s10: 0000000000000006
> + s11: ffffffd9f8fc1458 t3 : 0000000000000002 t4 : 0000000000000402
> + t5 : ffffffd8800610c0 t6 : ffffffd8800610e0
> +status: 0000000200000100 badaddr: ffffffd9f8fac458 cause: 0000000000000003
> +[<ffffffff8033f994>] plic_handle_irq+0xf2/0xf6
> +[<ffffffff8006319a>] generic_handle_domain_irq+0x1c/0x2a
> +[<ffffffff8033d7aa>] riscv_intc_irq+0x26/0x60
> +[<ffffffff806c92ee>] handle_riscv_irq+0x4a/0x74
> +[<ffffffff806d2346>] call_on_irq_stack+0x32/0x40
> +---[ end trace 0000000000000000 ]---
> +Unable to handle kernel NULL pointer dereference at virtual address
> 0000000000000004
> +Oops [#1]
> +Modules linked in:
> +CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W
> 6.10.0-rc1-starlight-02342-g0ba4c76ca0e8-dirty #323
> +Hardware name: BeagleV Starlight Beta (DT)
> +epc : plic_handle_irq+0x66/0xf6
> + ra : generic_handle_domain_irq+0x1c/0x2a
> +epc : ffffffff8033f908 ra : ffffffff8006319a sp : ffffffc800003f50
> + gp : ffffffff812d63f0 tp : ffffffd8800b8000 t0 : 0000000000000040
> + t1 : 0000000000000000 t2 : 0000000000001000 s0 : ffffffc800003fa0
> + s1 : 0000000000000009 a0 : ffffffd880183600 a1 : 0000000000000009
> + a2 : 0000000000000000 a3 : 0000000000000000 a4 : 0000000000000000
> + a5 : ffffffff8033d72a a6 : ffffffd880400248 a7 : ffffffd8804002b8
> + s2 : ffffffd9f8fac458 s3 : 0000000000000004 s4 : ffffffd880183630
> + s5 : ffffffff81293f58 s6 : ffffffff812948a0 s7 : ffffffff80c4e660
> + s8 : ffffffff80d9eea0 s9 : ffffffc800013b34 s10: 0000000000000006
> + s11: ffffffd9f8fc1458 t3 : 0000000000000002 t4 : 0000000000000402
> + t5 : ffffffd8800610c0 t6 : ffffffd8800610e0
> +status: 0000000200000100 badaddr: 0000000000000004 cause: 000000000000000d
> +[<ffffffff8033f908>] plic_handle_irq+0x66/0xf6
> +[<ffffffff8006319a>] generic_handle_domain_irq+0x1c/0x2a
> +[<ffffffff8033d7aa>] riscv_intc_irq+0x26/0x60
> +[<ffffffff806c92ee>] handle_riscv_irq+0x4a/0x74
> +[<ffffffff806d2346>] call_on_irq_stack+0x32/0x40
> +Code: 8b93 d70b 5b17 00f5 0b13 fa8b fc17 00a5 0c13 5a0c (a783) 0009
> +---[ end trace 0000000000000000 ]---
> +Kernel panic - not syncing: Fatal exception in interrupt
> +SMP: stopping secondary CPUs
> +---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
>
> As "mapped 133 interrupts" is no longer printed, it looks like an
> unexpected early interrupt comes in while still in plic_probe().
>
> Esmil suggested reverting all of:
> a7fb69ffd7ce438a irqchip/sifive-plic: Avoid explicit cpumask allocation on stack
> abb7205794900503 irqchip/sifive-plic: Improve locking safety by using
> irqsave/irqrestore
> 95652106478030f5 irqchip/sifive-plic: Parse number of interrupts and
> contexts early in plic_probe()
> a15587277a246c38 irqchip/sifive-plic: Cleanup PLIC contexts upon
> irqdomain creation failure
> 6c725f33d67b53f2 irqchip/sifive-plic: Use riscv_get_intc_hwnode() to
> get parent fwnode
> b68d0ff529a939a1 irqchip/sifive-plic: Use devm_xyz() for managed allocation
> 25d862e183d4efeb irqchip/sifive-plic: Use dev_xyz() in-place of pr_xyz()
> 8ec99b033147ef3b irqchip/sifive-plic: Convert PLIC driver into a platform driver
>
> After this, the PLIC is initialized earlier again, and this indeed
> seems to fix the issue for me.
> Before, the kernel booted fine in only ca. 1 out of 5 tries.
> After the reverts, it booted 5/5.
>
> Do you know what's going on? Is there a simpler fix?
The fact that you hit the warning indicates that plic_handle_irq() was called
before handler->present was set. Previously the PLIC driver was probed very
early, so it is unlikely that some peripheral already had a pending interrupt.
Now, while platform device drivers would not yet be able to request interrupts
(because the irqdomain is not registered yet), they could have programmed the
hardware in a way that generates an interrupt. If that interrupt was enabled at
the PLIC (e.g. by the bootloader), then we could expect plic_handle_irq() to be
called as soon as irq_set_chained_handler() is called.
So the fix is to not call irq_set_chained_handler() until after the handlers are
completely set up.
I've sent a patch doing this:
https://lore.kernel.org/linux-riscv/20240529215458.937817-1-samuel.holland@sifive.com/
Regards,
Samuel
Powered by blists - more mailing lists