lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6393a3a9d2882_579c1294b3@dwillia2-xfh.jf.intel.com.notmuch>
Date:   Fri, 9 Dec 2022 13:07:53 -0800
From:   Dan Williams <dan.j.williams@...el.com>
To:     Dan Williams <dan.j.williams@...el.com>,
        Luis Chamberlain <mcgrof@...nel.org>,
        <alison.schofield@...el.com>, <vishal.l.verma@...el.com>,
        <ira.weiny@...el.com>, <bwidawsk@...nel.org>
CC:     <dave@...olabs.net>, <a.manzanares@...sung.com>,
        <mcgrof@...nel.org>, <linux-cxl@...r.kernel.org>,
        <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH] cxl/acpi: fix null dereference on probe for missing
 ACPI_COMPANION()

Dan Williams wrote:
> Luis Chamberlain wrote:
> > Simply loading cxl_test ends up triggering a null pointer dereference
> > on next-20221207, and it happens because the fetched ACPI_COMPANION() can
> > end up not correct / missing. As with other code which uses ACPI_COMPANION()
> > (drivers/acpi/device_pm.c comes to mind) be defensive over the assumption
> > the companion is always present and bail right away.
> > 
> > This can be easily reproduced with kdevops [0] with linux next-20221207 [1]
> > and cxl enabled workflows:
> > 
> > make menuconfig      # enable cxl and linux-next
> > make                 # sets up variables, builds qemu from source
> > make linux           # builds and install next-20221207
> > make cxl             # installs cxl tool
> > make cxl-test-probe  # loads cxl_test
> > 
> > The oops:
> > 
> >  # modprobe cxl_test
> > No TPM handle discovered.
> > failed to open file /etc/ndctl/keys/nvdimm-master.blob: No such file or directory
> > 
> > [0] https://github.com/linux-kdevops/kdevops
> > [1] https://github.com/linux-kdevops/kdevops/blob/master/playbooks/roles/bootlinux/templates/config-next-20221207
> > 
> > cxl_mock: loading out-of-tree module taints kernel.
> > cxl_mock: loading test module taints kernel.
> > cxl_mem mem0: at cxl_root_port.0 no parent for dport: platform
> > cxl_mem mem1: at cxl_root_port.1 no parent for dport: platform
> > cxl_mem mem2: at cxl_root_port.2 no parent for dport: platform
> > cxl_mem mem3: at cxl_root_port.3 no parent for dport: platform
> > cxl_mem mem4: at cxl_root_port.0 no parent for dport: platform
> > cxl_mem mem5: at cxl_root_port.1 no parent for dport: platform
> > cxl_mem mem6: at cxl_root_port.2 no parent for dport: platform
> > cxl_mem mem7: at cxl_root_port.3 no parent for dport: platform
> > cxl_mem mem8: at cxl_root_port.4 no parent for dport: platform
> > cxl_mem mem9: at cxl_root_port.4 no parent for dport: platform
> > cxl_mem mem10: CXL port topology not found
> > BUG: kernel NULL pointer dereference, address: 00000000000002c0
> >  #PF: supervisor read access in kernel mode
> >  #PF: error_code(0x0000) - not-present page
> > PGD 0 P4D 0
> > Oops: 0000 [#1] PREEMPT SMP PTI
> > CPU: 4 PID: 1644 Comm: systemd-udevd Kdump: loaded Tainted: G           O     N 6.1.0-rc8-next-20221207 #5
> > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.1-0-g3208b098f51a-prebuilt.qemu.org 04/01/2014
> > RIP: 0010:cxl_acpi_probe+0xeb/0x2f0 [cxl_acpi]
> > Code: ff ff ff 48 c7 40 08 ff ff ff ff 48 c7 40 18 00 02 00 00 e8 57 29 fd ff 49 89 c7 41 89 c4 48 3d 00 f0 ff ff 0f 87 73 ff ff ff <49> 8b bd c0 02 00 00 48 c7 c1 c0 64 e4 c0 48 89 c2 31 f6 e8 bd f1
> > RSP: 0018:ffffbe6d008b7c30 EFLAGS: 00010287
> > RAX: ffff97a7c6e01000 RBX: ffff97a7c51fd810 RCX: 0000000000000000
> > RDX: 0000000000000001 RSI: 0000000000000282 RDI: 00000000ffffffff
> > RBP: 0000000000000000 R08: ffff97a7c51fdaa8 R09: 0000000000000010
> > R10: 0000000000000002 R11: 00000000000013c7 R12: 00000000c6e01000
> > R13: 0000000000000000 R14: ffff97a7d9c653a8 R15: ffff97a7c6e01000
> > FS:  00007f34b038ed00(0000) GS:ffff97a83bd00000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 00000000000002c0 CR3: 0000000102f7e005 CR4: 0000000000770ee0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > PKRU: 55555554
> > Call Trace:
> >  <TASK>
> >  ? kernfs_create_link+0x5d/0xa0
> >  platform_probe+0x41/0x90
> >  really_probe+0xdb/0x380
> >  ? pm_runtime_barrier+0x50/0x90
> >  __driver_probe_device+0x78/0x170
> >  driver_probe_device+0x1f/0x90
> >  __driver_attach+0xce/0x1c0
> >  ? __pfx___driver_attach+0x10/0x10
> >  bus_for_each_dev+0x73/0xc0
> >  bus_add_driver+0x1ae/0x200
> >  driver_register+0x89/0xe0
> >  ? __pfx_init_module+0x10/0x10 [cxl_acpi]
> >  do_one_initcall+0x43/0x220
> >  ? kmalloc_trace+0x26/0x90
> >  do_init_module+0x4a/0x1f0
> >  __do_sys_init_module+0x17f/0x1b0
> >  do_syscall_64+0x37/0x90
> >  entry_SYSCALL_64_after_hwframe+0x72/0xdc
> > RIP: 0033:0x7f34b061baaa
> > Code: 48 8b 0d 59 83 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 26 83 0c 00 f7 d8 64 89 01 48
> > RSP: 002b:00007fff6a198408 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
> > RAX: ffffffffffffffda RBX: 00005635afc7e5e0 RCX: 00007f34b061baaa
> > RDX: 00007f34b07a5efd RSI: 0000000000060a29 RDI: 00005635afdd6510
> > RBP: 00007f34b07a5efd R08: 000000000001f5b3 R09: 0000000000000000
> > R10: 000000000000eb81 R11: 0000000000000246 R12: 00005635afdd6510
> > R13: 0000000000000000 R14: 00005635afca6f40 R15: 00005635af874e50
> >  </TASK>
> > Modules linked in: cxl_acpi(+) cxl_pmem cxl_mem cxl_port cxl_mock_mem(ON) cxl_test(ON) cxl_mock(ON) cxl_core libnvdimm cbc encrypted_keys kvm_intel kvm 9p netfs irqbypass crct10dif_pclmul ghash_clmulni_intel sha512_ssse3 sha512_generic aesni_intel crypto_simd cryptd cirrus drm_shmem_helper 9pnet_virtio virtio_balloon i6300esb drm_kms_helper joydev evdev button serio_raw drm configfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 md_mod virtio_net net_failover virtio_blk failover psmouse virtio_pci virtio_pci_legacy_dev nvme virtio_pci_modern_dev crc32_pclmul nvme_core virtio crc32c_intel t10_pi virtio_ring crc64_rocksoft crc64
> > 
> > And gdb:
> > 
> > (gdb) l *(cxl_acpi_probe+0xeb)
> > 0xa8b is in cxl_acpi_probe (tools/testing/cxl/../../../drivers/cxl/acpi.c:648).
> > 643
> > 644             root_port = devm_cxl_add_port(host, host, CXL_RESOURCE_NONE, NULL);
> > 645             if (IS_ERR(root_port))
> > 646                     return PTR_ERR(root_port);
> > 647
> > 648             rc = bus_for_each_dev(adev->dev.bus, NULL, root_port,
> > 649                                   add_host_bridge_dport);
> > 650             if (rc < 0)
> > 651                     return rc;
> > 652
> > 
> > Signed-off-by: Luis Chamberlain <mcgrof@...nel.org>
> > ---
> > 
> > Note: kdevops also suports now the target:
> > 
> >   make cxl-test-meson
> > 
> > But that does not *at least* crash the kernel although the tests fail too...
> > This is likely a misconfiguration of some sort, but the same kernel
> > works fine when I enable a Type 3 memory device (also supported on
> > kdevops via CONFIG_QEMU_ENABLE_CXL_DEMO_TOPOLOGY_1). This test was run
> > without that enabled, so a naked cxl system.
> > 
> > Even if it *was* a mis-configuration, such things should not crash the kernel.
> > 
> >  drivers/cxl/acpi.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> > index ad0849af42d7..cf5d1a455efc 100644
> > --- a/drivers/cxl/acpi.c
> > +++ b/drivers/cxl/acpi.c
> > @@ -633,6 +633,9 @@ static int cxl_acpi_probe(struct platform_device *pdev)
> >  	struct acpi_device *adev = ACPI_COMPANION(host);
> >  	struct cxl_cfmws_context ctx;
> >  
> > +	if (!adev)
> > +		return -ENODEV;
> > +
> 
> I can hear the static analysis bots sharpening their knives at the
> thought that ACPI platform device drivers need to check the result of
> ACPI_COMPANION(). This is clearly a cxl_test bug, not something the
> cxl_acpi driver should ever worry about. If ACPI_COMPANION() is failing
> for the mock platform device then there are bigger problems afoot and
> this is just a band-aid until the next failure.
> 
> I'll try booting linux-next, because cxl_test is working for me just
> testing the tip of cxl.git/next.

Ok, my 6.1.0-rc8-next-20221208 build passed.

# meson test -C build --suite cxl
ninja: Entering directory `/root/git/ndctl/build'
[109/109] Linking target ndctl/ndctl
1/5 ndctl:cxl / cxl-topology.sh             OK              11.84s
2/5 ndctl:cxl / cxl-region-sysfs.sh         OK               6.82s
3/5 ndctl:cxl / cxl-labels.sh               OK              10.14s
4/5 ndctl:cxl / cxl-create-region.sh        OK              18.32s
5/5 ndctl:cxl / security-cxl.sh             OK               3.35s

So, what I suspect is happening is that the kdevops environment is
permitting the "production" version of cxl_acpi.ko to load rather than
the test module instrumented to make ACPI_COMPANION() work with mock
devices. I.e. note that tools/testing/cxl/Kbuild has:

    ldflags-y += --wrap=is_acpi_device_node

...which means if cxl_acpi is the production version it will call the
real is_acpi_device_node() and result in the crash signature you see.

Let's just preclude that from happening. With the patch below if I force
install the production modules:

    # insmod /lib/modules/6.1.0-rc8-next-20221208+/kernel/drivers/cxl/core/cxl_core.ko 
    # insmod /lib/modules/6.1.0-rc8-next-20221208+/kernel/drivers/cxl/cxl_acpi.ko 

...and then try to load cxl_test I get:

    # modprobe cxl_test
    modprobe: ERROR: could not insert 'cxl_test': Unknown symbol in module, or unknown parameter (see dmesg)

...where dmesg has:

    cxl_test: Unknown symbol cxl_acpi_test (err -2)
    cxl_test: Unknown symbol cxl_core_test (err -2)

-- >8 --
>From 93bf2c04cd3a708c73c0e4ad7a4121505a0698da Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams@...el.com>
Date: Fri, 9 Dec 2022 13:04:26 -0800
Subject: [PATCH] tools/testing/cxl: Prevent cxl_test from confusing production
 modules

The cxl_test machinery builds modified versions of the modules in
drivers/cxl/ and intercepts some of their calls to allow cxl_test to
inject mock CXL topologies for test.

However, if cxl_test attempts the same with production modules,
fireworks ensue as Luis discovered [1]. Prevent that scenario by
arranging for cxl_test to check for a "watermark" symbol in each of the
modules it expects to be modified before the test can run. This turns
undefined runtime behavior or crashes into a safer failure to load the
cxl_test module.

Link: http://lore.kernel.org/r/20221209062919.1096779-1-mcgrof@kernel.org [1]
Reported-by: Luis Chamberlain <mcgrof@...nel.org>
Signed-off-by: Dan Williams <dan.j.williams@...el.com>
---
 tools/testing/cxl/Kbuild          |  6 ++++++
 tools/testing/cxl/cxl_acpi_test.c |  6 ++++++
 tools/testing/cxl/cxl_core_test.c |  6 ++++++
 tools/testing/cxl/cxl_mem_test.c  |  6 ++++++
 tools/testing/cxl/cxl_pmem_test.c |  6 ++++++
 tools/testing/cxl/cxl_port_test.c |  6 ++++++
 tools/testing/cxl/test/cxl.c      |  8 ++++++++
 tools/testing/cxl/watermark.h     | 25 +++++++++++++++++++++++++
 8 files changed, 69 insertions(+)
 create mode 100644 tools/testing/cxl/cxl_acpi_test.c
 create mode 100644 tools/testing/cxl/cxl_core_test.c
 create mode 100644 tools/testing/cxl/cxl_mem_test.c
 create mode 100644 tools/testing/cxl/cxl_pmem_test.c
 create mode 100644 tools/testing/cxl/cxl_port_test.c
 create mode 100644 tools/testing/cxl/watermark.h

diff --git a/tools/testing/cxl/Kbuild b/tools/testing/cxl/Kbuild
index 0805f08af8b3..427174feeb7d 100644
--- a/tools/testing/cxl/Kbuild
+++ b/tools/testing/cxl/Kbuild
@@ -23,22 +23,27 @@ obj-m += cxl_acpi.o
 cxl_acpi-y := $(CXL_SRC)/acpi.o
 cxl_acpi-y += mock_acpi.o
 cxl_acpi-y += config_check.o
+cxl_acpi-y += cxl_acpi_test.o
 
 obj-m += cxl_pmem.o
 
 cxl_pmem-y := $(CXL_SRC)/pmem.o
 cxl_pmem-y += $(CXL_SRC)/security.o
 cxl_pmem-y += config_check.o
+cxl_pmem-y += cxl_pmem_test.o
 
 obj-m += cxl_port.o
 
 cxl_port-y := $(CXL_SRC)/port.o
 cxl_port-y += config_check.o
+cxl_port-y += cxl_port_test.o
+
 
 obj-m += cxl_mem.o
 
 cxl_mem-y := $(CXL_SRC)/mem.o
 cxl_mem-y += config_check.o
+cxl_mem-y += cxl_mem_test.o
 
 obj-m += cxl_core.o
 
@@ -51,5 +56,6 @@ cxl_core-y += $(CXL_CORE_SRC)/pci.o
 cxl_core-y += $(CXL_CORE_SRC)/hdm.o
 cxl_core-$(CONFIG_CXL_REGION) += $(CXL_CORE_SRC)/region.o
 cxl_core-y += config_check.o
+cxl_core-y += cxl_core_test.o
 
 obj-m += test/
diff --git a/tools/testing/cxl/cxl_acpi_test.c b/tools/testing/cxl/cxl_acpi_test.c
new file mode 100644
index 000000000000..8602dc27c81c
--- /dev/null
+++ b/tools/testing/cxl/cxl_acpi_test.c
@@ -0,0 +1,6 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 2022 Intel Corporation. All rights reserved. */
+
+#include "watermark.h"
+
+cxl_test_watermark(cxl_acpi);
diff --git a/tools/testing/cxl/cxl_core_test.c b/tools/testing/cxl/cxl_core_test.c
new file mode 100644
index 000000000000..464a9255e4d6
--- /dev/null
+++ b/tools/testing/cxl/cxl_core_test.c
@@ -0,0 +1,6 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 2022 Intel Corporation. All rights reserved. */
+
+#include "watermark.h"
+
+cxl_test_watermark(cxl_core);
diff --git a/tools/testing/cxl/cxl_mem_test.c b/tools/testing/cxl/cxl_mem_test.c
new file mode 100644
index 000000000000..ba7fb8a44288
--- /dev/null
+++ b/tools/testing/cxl/cxl_mem_test.c
@@ -0,0 +1,6 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 2022 Intel Corporation. All rights reserved. */
+
+#include "watermark.h"
+
+cxl_test_watermark(cxl_mem);
diff --git a/tools/testing/cxl/cxl_pmem_test.c b/tools/testing/cxl/cxl_pmem_test.c
new file mode 100644
index 000000000000..3fd884fae537
--- /dev/null
+++ b/tools/testing/cxl/cxl_pmem_test.c
@@ -0,0 +1,6 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 2022 Intel Corporation. All rights reserved. */
+
+#include "watermark.h"
+
+cxl_test_watermark(cxl_pmem);
diff --git a/tools/testing/cxl/cxl_port_test.c b/tools/testing/cxl/cxl_port_test.c
new file mode 100644
index 000000000000..be183917a9f6
--- /dev/null
+++ b/tools/testing/cxl/cxl_port_test.c
@@ -0,0 +1,6 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 2022 Intel Corporation. All rights reserved. */
+
+#include "watermark.h"
+
+cxl_test_watermark(cxl_port);
diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
index 30ee680d38ff..920bd969c554 100644
--- a/tools/testing/cxl/test/cxl.c
+++ b/tools/testing/cxl/test/cxl.c
@@ -9,6 +9,8 @@
 #include <linux/pci.h>
 #include <linux/mm.h>
 #include <cxlmem.h>
+
+#include "../watermark.h"
 #include "mock.h"
 
 static int interleave_arithmetic;
@@ -1119,6 +1121,12 @@ static __init int cxl_test_init(void)
 {
 	int rc, i;
 
+	cxl_acpi_test();
+	cxl_core_test();
+	cxl_mem_test();
+	cxl_pmem_test();
+	cxl_port_test();
+
 	register_cxl_mock_ops(&cxl_mock_ops);
 
 	cxl_mock_pool = gen_pool_create(ilog2(SZ_2M), NUMA_NO_NODE);
diff --git a/tools/testing/cxl/watermark.h b/tools/testing/cxl/watermark.h
new file mode 100644
index 000000000000..9d81d4a5f6be
--- /dev/null
+++ b/tools/testing/cxl/watermark.h
@@ -0,0 +1,25 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 2022 Intel Corporation. All rights reserved. */
+#ifndef _TEST_CXL_WATERMARK_H_
+#define _TEST_CXL_WATERMARK_H_
+#include <linux/module.h>
+#include <linux/printk.h>
+
+int cxl_acpi_test(void);
+int cxl_core_test(void);
+int cxl_mem_test(void);
+int cxl_pmem_test(void);
+int cxl_port_test(void);
+
+/*
+ * dummy routine for cxl_test to validate it is linking to the properly
+ * mocked module and not the standard one from the base tree.
+ */
+#define cxl_test_watermark(x)				\
+int x##_test(void)					\
+{							\
+	pr_debug("%s for cxl_test\n", KBUILD_MODNAME);	\
+	return 0;					\
+}							\
+EXPORT_SYMBOL(x##_test)
+#endif /* _TEST_CXL_WATERMARK_H_ */
-- 
2.38.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ