lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <87wnsvprio.wl-maz@kernel.org>
Date:   Wed, 21 Apr 2021 16:49:03 +0100
From:   Marc Zyngier <maz@...nel.org>
To:     dann frazier <dann.frazier@...onical.com>
Cc:     linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
        Sumit Garg <sumit.garg@...aro.org>, kernel-team@...roid.com,
        Russell King <linux@....linux.org.uk>,
        Catalin Marinas <catalin.marinas@....com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Will Deacon <will@...nel.org>, Fu Wei <fu.wei@...aro.org>
Subject: Re: [PATCH 08/11] irqchip/gic: Configure SGIs as standard interrupts

On Wed, 21 Apr 2021 15:52:52 +0100,
dann frazier <dann.frazier@...onical.com> wrote:
> 
> [ + Fu Wei ]

[...]

> >
> > Please feed this stacktrace to scripts/decode_stacktrace.sh so that I
> > can get an idea about what is going wrong. I bet something is playing
> > ungodly games with the one of the IPIs, and things go horribly wrong.
> 
> hey Marc,
>   Sure:
> 
> [    7.927289] Unable to handle kernel read from unreadable memory at virtual address 0000000000000028
> [    7.936326] Mem abort info:
> [    7.939108]   ESR = 0x96000004
> [    7.942151]   EC = 0x25: DABT (current EL), IL = 32 bits
> [    7.947451]   SET = 0, FnV = 0
> [    7.950494]   EA = 0, S1PTW = 0
> [    7.953624] Data abort info:
> [    7.956492]   ISV = 0, ISS = 0x00000004
> [    7.960316]   CM = 0, WnR = 0
> [    7.963273] [0000000000000028] user address but active_mm is swapper
> [    7.969616] Internal error: Oops: 96000004 [#1] SMP
> [    7.974483] Modules linked in:
> [    7.977531] CPU: 9 PID: 1 Comm: swapper/0 Not tainted 5.12.0-rc8 #19
> [    7.983874] Hardware name: GIGABYTE R120-T33/MT30-GS1, BIOS F02 08/06/2019
> [    7.990737] pstate: 40400085 (nZcv daIf +PAN -UAO -TCO BTYPE=--)
> [    7.996732] pc : __ipi_send_mask (/home/ubuntu/linux/./include/linux/irqdomain.h:537 /home/ubuntu/linux/kernel/irq/ipi.c:283) 
> [    8.000910] lr : smp_cross_call (/home/ubuntu/linux/arch/arm64/kernel/smp.c:958) 
> [    8.004913] sp : ffff800012753c10
> [    8.008216] x29: ffff800012753c10 x28: ffff000100de5d00
> [    8.013521] x27: 000000000000000a x26: ffff80001225da20
> [    8.018825] x25: 0000000000000000 x24: ffff000ff62719b0
> [    8.024129] x23: ffff80001225d000 x22: ffff800012368108
> [    8.029433] x21: ffff800010f69a20 x20: 0000000000000000
> [    8.034737] x19: ffff000100143c60 x18: 0000000000000020
> [    8.040041] x17: 000000008e74252f x16: 00000000bf0ab2ad
> [    8.045345] x15: ffffffffffffffff x14: 0000000000000000
> [    8.050649] x13: 003d090000000000 x12: 00003d0900000000
> [    8.055953] x11: 0000000000000000 x10: 00003d0900000000
> [    8.061257] x9 : ffff800010027f14 x8 : 0000000000000000
> [    8.066561] x7 : 00000000ffffffff x6 : ffff000ff6148698
> [    8.071865] x5 : ffff80001159d040 x4 : ffff80001159d110
> [    8.077169] x3 : ffff800010f69a00 x2 : 0000000000000000
> [    8.082473] x1 : ffff800010f69a20 x0 : 0000000000000000
> [    8.087777] Call trace:
> [    8.090213] __ipi_send_mask (/home/ubuntu/linux/./include/linux/irqdomain.h:537 /home/ubuntu/linux/kernel/irq/ipi.c:283) 

Thanks for that. This resolves to:

	if (irq_domain_is_ipi_per_cpu(data->domain)) {

data->domain is NULL, and we probably are using freed memory...

> > Now, here's a hunch: in the fine TX1 tradition, the firmware is broken
> > and the GTDT table looks unusable. Amusingly, the crash happens right
> > after the SBSA watchdog fails to probe.
> 
> Yeah, I noticed that, but didn't highlight it as I didn't see it in
> the backtrace...
> 
> > And looking at the code that implements that driver, it looks dodgy as
> > hell, as it unmaps an interrupt it doesn't even know is valid. And it
> > does that right when the driver fails the way you experienced it. If,
> > by any chance, the interrupt field is 0 in the firmware table, this
> > results in SGI0 being unmapped. Given that this is the rescheduling
> > interrupt, fireworks happen.
> 
> ... and that explains why. I wouldn't have gotten there, but wish I'd
> thought to test w/ the watchdog compiled out :(

No worries. This IRQ series has uncovered a number of terrible driver
behaviours since I merged it, and these bugs are worth every penny.

> > Can you have a go with the patchlet below, and let me know if that
> > helps?
> 
> It does!

Awesome. I'll Cc you on the actual patch, feel free to respond with a
Tested-by: if you want.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ