lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8735vjrjj3.wl-maz@kernel.org>
Date:   Wed, 21 Apr 2021 11:58:40 +0100
From:   Marc Zyngier <maz@...nel.org>
To:     dann frazier <dann.frazier@...onical.com>
Cc:     linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
        Sumit Garg <sumit.garg@...aro.org>, kernel-team@...roid.com,
        Russell King <linux@....linux.org.uk>,
        Catalin Marinas <catalin.marinas@....com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Will Deacon <will@...nel.org>
Subject: Re: [PATCH 08/11] irqchip/gic: Configure SGIs as standard interrupts

Hi Dan,n

On Tue, 20 Apr 2021 22:25:51 +0100,
dann frazier <dann.frazier@...onical.com> wrote:
> 
> On Tue, Apr 20, 2021 at 02:37:10PM -0600, dann frazier wrote:
> > On Tue, May 19, 2020 at 05:17:52PM +0100, Marc Zyngier wrote:
> > > Change the way we deal with GIC SGIs by turning them into proper
> > > IRQs, and calling into the arch code to register the interrupt range
> > > instead of a callback.
> > > 
> > > Signed-off-by: Marc Zyngier <maz@...nel.org>
> > 
> > hey Marc,
> > 
> >   I bisected a boot failure on our Gigabyte R120-T33 systems (ThunderX
> > CN88XX) down to this commit, but only when running in ACPI mode. See below:
> > 
> > 
> > EFI stub: Booting Linux Kernel...
> > EFI stub: EFI_RNG_PROTOCOL unavailable, KASLR will be disabled
> > EFI stub: Using DTB from configuration table
> > EFI stub: Exiting boot services and installing virtual address map...
> > [    0.000000] Booting Linux on physical CPU 0x0000000000 [0x431f0a11]
> > [    0.000000] Linux version 5.11.0-13-generic (buildd@...02-arm64-067) (gcc (Ubuntu 10.2.1-23ubuntu1) 10.2.1 20210312, GNU ld (GNU Binutils for Ubuntu) 2.36.1) #14-Ubuntu SMP Fri Mar 19 16:57:35 UTC 2021 (Ubuntu 5.11.0-13.14-generic 5.11.7)
> 
> Sorry, realized I posted a log from an Ubuntu kernel. Here's an
> upstream one:

[...]

> 
> [    7.842174] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 243)
> [    7.849699] io scheduler mq-deadline registered
> [    7.857591] shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
> [    7.865127] efifb: probing for efifb
> [    7.868738] efifb: No BGRT, not showing boot graphics
> [    7.873783] efifb: framebuffer at 0x881010000000, using 3072k, total 3072k
> [    7.880649] efifb: mode is 1024x768x32, linelength=4096, pages=1
> [    7.886647] efifb: scrolling: redraw
> [    7.890212] efifb: Truecolor: size=8:8:8:8, shift=24:16:8:0
> [    7.895905] fbcon: Deferring console take-over
> [    7.900350] fb0: EFI VGA frame buffer device
> [    7.905289] input: Power Button as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0C:00/input/input0
> [    7.913714] ACPI: button: Power Button [PWRB]
> [    7.919549] ACPI GTDT: [Firmware Bug]: failed to get the Watchdog base address.
> [    7.927289] Unable to handle kernel read from unreadable memory at virtual address 0000000000000028
> [    7.936326] Mem abort info:
> [    7.939108]   ESR = 0x96000004
> [    7.942151]   EC = 0x25: DABT (current EL), IL = 32 bits
> [    7.947451]   SET = 0, FnV = 0
> [    7.950494]   EA = 0, S1PTW = 0
> [    7.953624] Data abort info:
> [    7.956492]   ISV = 0, ISS = 0x00000004
> [    7.960316]   CM = 0, WnR = 0
> [    7.963273] [0000000000000028] user address but active_mm is swapper
> [    7.969616] Internal error: Oops: 96000004 [#1] SMP
> [    7.974483] Modules linked in:
> [    7.977531] CPU: 9 PID: 1 Comm: swapper/0 Not tainted 5.12.0-rc8 #19
> [    7.983874] Hardware name: GIGABYTE R120-T33/MT30-GS1, BIOS F02 08/06/2019
> [    7.990737] pstate: 40400085 (nZcv daIf +PAN -UAO -TCO BTYPE=--)
> [    7.996732] pc : __ipi_send_mask+0x60/0x114
> [    8.000910] lr : smp_cross_call+0x40/0xcc
> [    8.004913] sp : ffff800012753c10
> [    8.008216] x29: ffff800012753c10 x28: ffff000100de5d00 
> [    8.013521] x27: 000000000000000a x26: ffff80001225da20 
> [    8.018825] x25: 0000000000000000 x24: ffff000ff62719b0 
> [    8.024129] x23: ffff80001225d000 x22: ffff800012368108 
> [    8.029433] x21: ffff800010f69a20 x20: 0000000000000000 
> [    8.034737] x19: ffff000100143c60 x18: 0000000000000020 
> [    8.040041] x17: 000000008e74252f x16: 00000000bf0ab2ad 
> [    8.045345] x15: ffffffffffffffff x14: 0000000000000000 
> [    8.050649] x13: 003d090000000000 x12: 00003d0900000000 
> [    8.055953] x11: 0000000000000000 x10: 00003d0900000000 
> [    8.061257] x9 : ffff800010027f14 x8 : 0000000000000000 
> [    8.066561] x7 : 00000000ffffffff x6 : ffff000ff6148698 
> [    8.071865] x5 : ffff80001159d040 x4 : ffff80001159d110 
> [    8.077169] x3 : ffff800010f69a00 x2 : 0000000000000000 
> [    8.082473] x1 : ffff800010f69a20 x0 : 0000000000000000 
> [    8.087777] Call trace:
> [    8.090213]  __ipi_send_mask+0x60/0x114
> [    8.094038]  smp_cross_call+0x40/0xcc
> [    8.097691]  smp_send_reschedule+0x3c/0x50
> [    8.101778]  resched_curr+0x5c/0xb0
> [    8.105258]  check_preempt_curr+0x58/0x90
> [    8.109258]  ttwu_do_wakeup+0x2c/0x190
> [    8.112996]  ttwu_do_activate+0x7c/0x114
> [    8.116909]  try_to_wake_up+0x388/0x670
> [    8.120735]  wake_up_process+0x24/0x30
> [    8.124474]  swake_up_one+0x48/0x9c
> [    8.127953]  rcu_gp_kthread_wake+0x68/0x8c
> [    8.132041]  rcu_accelerate_cbs_unlocked+0xb4/0xf0
> [    8.136822]  rcu_core+0x520/0x694
> [    8.140128]  rcu_core_si+0x1c/0x2c
> [    8.143520]  __do_softirq+0x128/0x388
> [    8.147172]  irq_exit+0xc4/0xec
> [    8.150304]  __handle_domain_irq+0x8c/0xec
> [    8.154394]  gic_handle_irq+0xd8/0x2f0
> [    8.158132]  el1_irq+0xc0/0x180
> [    8.161262]  __pi_strcmp+0x20/0x158
> [    8.164742]  driver_register+0x68/0x140
> [    8.168571]  __platform_driver_register+0x34/0x40
> [    8.173265]  imx8mp_clk_driver_init+0x28/0x34
> [    8.177614]  do_one_initcall+0x50/0x260
> [    8.181440]  kernel_init_freeable+0x24c/0x2d4
> [    8.185790]  kernel_init+0x20/0x134
> [    8.189271]  ret_from_fork+0x10/0x18
> [    8.192840] Code: a90363f7 aa0103f5 d0010957 f9401260 (b9402800) 
> [    8.198955] ---[ end trace c24172add816c1f0 ]---
> [    8.203562] Kernel panic - not syncing: Oops: Fatal exception in interrupt
> [    8.210442] SMP: stopping secondary CPUs
> [    9.258360] SMP: failed to stop secondary CPUs 0,9
> [    9.263141] Kernel Offset: disabled
> [    9.266617] CPU features: 0x00040002,69101108
> [    9.270963] Memory Limit: none
> [    9.274024] ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---

Please feed this stacktrace to scripts/decode_stacktrace.sh so that I
can get an idea about what is going wrong. I bet something is playing
ungodly games with the one of the IPIs, and things go horribly wrong.

Now, here's a hunch: in the fine TX1 tradition, the firmware is broken
and the GTDT table looks unusable. Amusingly, the crash happens right
after the SBSA watchdog fails to probe.

And looking at the code that implements that driver, it looks dodgy as
hell, as it unmaps an interrupt it doesn't even know is valid. And it
does that right when the driver fails the way you experienced it. If,
by any chance, the interrupt field is 0 in the firmware table, this
results in SGI0 being unmapped. Given that this is the rescheduling
interrupt, fireworks happen.

Can you have a go with the patchlet below, and let me know if that
helps?

Thanks,

	M.

diff --git a/drivers/acpi/arm64/gtdt.c b/drivers/acpi/arm64/gtdt.c
index f2d0e5915dab..0a0a982f9c28 100644
--- a/drivers/acpi/arm64/gtdt.c
+++ b/drivers/acpi/arm64/gtdt.c
@@ -329,7 +329,7 @@ static int __init gtdt_import_sbsa_gwdt(struct acpi_gtdt_watchdog *wd,
 					int index)
 {
 	struct platform_device *pdev;
-	int irq = map_gt_gsi(wd->timer_interrupt, wd->timer_flags);
+	int irq;
 
 	/*
 	 * According to SBSA specification the size of refresh and control
@@ -338,7 +338,7 @@ static int __init gtdt_import_sbsa_gwdt(struct acpi_gtdt_watchdog *wd,
 	struct resource res[] = {
 		DEFINE_RES_MEM(wd->control_frame_address, SZ_4K),
 		DEFINE_RES_MEM(wd->refresh_frame_address, SZ_4K),
-		DEFINE_RES_IRQ(irq),
+		{},
 	};
 	int nr_res = ARRAY_SIZE(res);
 
@@ -348,10 +348,11 @@ static int __init gtdt_import_sbsa_gwdt(struct acpi_gtdt_watchdog *wd,
 
 	if (!(wd->refresh_frame_address && wd->control_frame_address)) {
 		pr_err(FW_BUG "failed to get the Watchdog base address.\n");
-		acpi_unregister_gsi(wd->timer_interrupt);
 		return -EINVAL;
 	}
 
+	irq = map_gt_gsi(wd->timer_interrupt, wd->timer_flags);
+	res[2] = (struct resource)DEFINE_RES_IRQ(irq);
 	if (irq <= 0) {
 		pr_warn("failed to map the Watchdog interrupt.\n");
 		nr_res--;
@@ -364,7 +365,8 @@ static int __init gtdt_import_sbsa_gwdt(struct acpi_gtdt_watchdog *wd,
 	 */
 	pdev = platform_device_register_simple("sbsa-gwdt", index, res, nr_res);
 	if (IS_ERR(pdev)) {
-		acpi_unregister_gsi(wd->timer_interrupt);
+		if (irq > 0)
+			acpi_unregister_gsi(wd->timer_interrupt);
 		return PTR_ERR(pdev);
 	}
 

-- 
Without deviation from the norm, progress is not possible.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ