lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240821165034.1af97bad@fedora-3.home>
Date: Wed, 21 Aug 2024 16:50:34 +0200
From: Maxime Chevallier <maxime.chevallier@...tlin.com>
To: Thomas Gleixner <tglx@...utronix.de>, Andrew Lunn <andrew@...n.ch>,
 Gregory Clement <gregory.clement@...tlin.com>, Sebastian Hesselbarth
 <sebastian.hesselbarth@...il.com>, Russell King <linux@...linux.org.uk>
Cc: linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
 Thomas Petazzoni <thomas.petazzoni@...tlin.com>
Subject: Regression on Macchiatobin from the irqchip driver

Hi everyone,

I've been testing out some network series on the Macchiatobin (Armada
8k SoC) and I stumbled upon a crash at boot, that showed-up on the
latest net-next branch :

[    2.755698] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[    2.757592] mmcblk0: mmc0:0001 8GME4R 7.28 GiB
[    2.766033] Mem abort info:
[    2.766036]   ESR = 0x0000000096000004
[    2.774534]  mmcblk0: p1
[    2.777086]   EC = 0x25: DABT (current EL), IL = 32 bits
[    2.779893] mmcblk0boot0: mmc0:0001 8GME4R 4.00 MiB
[    2.784965]   SET = 0, FnV = 0
[    2.784969]   EA = 0, S1PTW = 0
[    2.784972]   FSC = 0x04: level 0 translation fault
[    2.784976] Data abort info:
[    2.790648] mmcblk0boot1: mmc0:0001 8GME4R 4.00 MiB
[    2.792943]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[    2.796867] mmcblk0rpmb: mmc0:0001 8GME4R 512 KiB, chardev (234:0)
[    2.801002]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[    2.801006]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[    2.830960] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000101a75000
[    2.837436] [0000000000000008] pgd=0000000000000000, p4d=0000000000000000
[    2.844265] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
[    2.850560] Modules linked in:
[    2.853631] CPU: 2 UID: 0 PID: 51 Comm: kworker/u18:2 Not tainted 6.10.0-12649-g25010bfdf8bb #10
[    2.862457] Hardware name: Marvell 8040 MACCHIATOBin Double-shot (DT)
[    2.868926] Workqueue: events_unbound deferred_probe_work_func
[    2.874800] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[    2.881794] pc : msi_lib_irq_domain_select+0x28/0x58
[    2.886786] lr : irq_find_matching_fwspec+0xf0/0x120
[    2.891778] sp : ffff8000871f39e0
[    2.895107] x29: ffff8000871f39e0 x28: 0000000000000000 x27: ffff00013f7e11e8
[    2.902281] x26: 0000000000000004 x25: ffff00013f7e11d0 x24: ffff800086764dd0
[    2.909453] x23: ffff800081633b28 x22: ffff8000871f3a68 x21: 0000000000000001
[    2.916624] x20: ffff800086764df0 x19: ffff000101171000 x18: ffffffffffffffff
[    2.923797] x17: ffff000100de8a0c x16: ffff000100de8a00 x15: ffff000100da9cca
[    2.930969] x14: ffffffffffffffff x13: 0030354072656c6c x12: 6f72746e6f632d74
[    2.938141] x11: 7075727265746e69 x10: 0000000000036018 x9 : 0000000000000001
[    2.945314] x8 : ffff8000871f3ab8 x7 : 0000000000000000 x6 : 4d0c1e0cade3e5ec
[    2.952486] x5 : 6c65632d0c1e0c4d x4 : ffff00013f7e11e8 x3 : ffff000101171000
[    2.959659] x2 : 0000000000000004 x1 : 0000000000000000 x0 : 0000000000000000
[    2.966831] Call trace:
[    2.969288]  msi_lib_irq_domain_select+0x28/0x58
[    2.973928]  irq_find_matching_fwnode+0x4c/0x78
[    2.978484]  of_msi_get_domain+0x11c/0x138
[    2.982602]  mvebu_icu_subset_probe+0x5c/0x124
[    2.987068]  platform_probe+0x68/0xdc
[    2.990748]  really_probe+0xbc/0x2a4
[    2.994343]  __driver_probe_device+0x78/0x12c
[    2.998722]  driver_probe_device+0xdc/0x160
[    3.002926]  __device_attach_driver+0xb8/0x134
[    3.007392]  bus_for_each_drv+0x80/0xdc
[    3.011248]  __device_attach+0xa8/0x1b0
[    3.015103]  device_initial_probe+0x14/0x20
[    3.019307]  bus_probe_device+0xa8/0xac
[    3.023162]  deferred_probe_work_func+0x88/0xc0
[    3.027714]  process_one_work+0x150/0x294
[    3.031743]  worker_thread+0x2e4/0x3ec
[    3.035510]  kthread+0x118/0x11c
[    3.038756]  ret_from_fork+0x10/0x20
[    3.042353] Code: d65f03c0 b9400820 35ffffa0 f9404461 (b9400823) 
[    3.048473] ---[ end trace 0000000000000000 ]---

I bisected the bug and the crash appeared at :

fbdf14e90ce4 ("irqchip/irq-mvebu-sei: Switch to MSI parent")

I've briefly looked at it, and it seems the NULL pointer that's being
dereferenced here is the "ops" pointer in msi_lib_irq_domain_select [1]

I'm not very familiar with the irqchip subsystem, my best guess
is that this is being called for the ap_domain, in the irq-mvebu-sei
driver, which doesn't have any msi_parent_ops set [2].

By looking at the msi_lib_irq_domain_select() implementation however, I
notice that it appears to be expected that these ops can be NULL by
looking at the check in the return line :

	return ops && !!(ops->bus_select_mask & busmask);

However, the line above dereferences the ops pointer without prior
check :

	/* Handle pure domain searches */
	if (bus_token == ops->bus_select_token)
		return 1;

As I said, this area of the kernel isn't very familiar to me, but I got
my board to boot with the following patch :

--- a/drivers/irqchip/irq-msi-lib.c
+++ b/drivers/irqchip/irq-msi-lib.c
@@ -128,6 +128,9 @@ int msi_lib_irq_domain_select(struct irq_domain *d, struct irq_fwspec *fwspec,
        const struct msi_parent_ops *ops = d->msi_parent_ops;
        u32 busmask = BIT(bus_token);
 
+       if (!ops)
+               return 0;
+
        if (fwspec->fwnode != d->fwnode || fwspec->param_count != 0)
                return 0;
 
@@ -135,6 +138,6 @@ int msi_lib_irq_domain_select(struct irq_domain *d, struct irq_fwspec *fwspec,
        if (bus_token == ops->bus_select_token)
                return 1;
 
-       return ops && !!(ops->bus_select_mask & busmask);
+       return !!(ops->bus_select_mask & busmask);

----------------------------

I have zero confidence that this is the correct solution to the issue
so feel free to ditch that solution :) I'll gladly test any
patch for that on the MCBIN.

Let me know if you want me to run more tests.

Thanks,

Maxime

[1] : https://elixir.bootlin.com/linux/v6.11-rc4/source/drivers/irqchip/irq-msi-lib.c#L125
[2] : https://elixir.bootlin.com/linux/v6.11-rc4/source/drivers/irqchip/irq-mvebu-sei.c#L423

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ