[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <63991ec886e85_b05d1294a6@dwillia2-xfh.jf.intel.com.notmuch>
Date: Tue, 13 Dec 2022 16:54:32 -0800
From: Dan Williams <dan.j.williams@...el.com>
To: Luis Chamberlain <mcgrof@...nel.org>,
Dan Williams <dan.j.williams@...el.com>
CC: <alison.schofield@...el.com>, <vishal.l.verma@...el.com>,
<ira.weiny@...el.com>, <bwidawsk@...nel.org>, <dave@...olabs.net>,
<a.manzanares@...sung.com>, <linux-cxl@...r.kernel.org>,
<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] cxl/acpi: fix null dereference on probe for missing
ACPI_COMPANION()
Luis Chamberlain wrote:
> On Fri, Dec 09, 2022 at 01:07:53PM -0800, Dan Williams wrote:
> > Dan Williams wrote:
> > > Luis Chamberlain wrote:
> > > > Simply loading cxl_test ends up triggering a null pointer dereference
> > > > on next-20221207,
> >
> > Ok, my 6.1.0-rc8-next-20221208 build passed.
> >
> > # meson test -C build --suite cxl
> > ninja: Entering directory `/root/git/ndctl/build'
> > [109/109] Linking target ndctl/ndctl
> > 1/5 ndctl:cxl / cxl-topology.sh OK 11.84s
> > 2/5 ndctl:cxl / cxl-region-sysfs.sh OK 6.82s
> > 3/5 ndctl:cxl / cxl-labels.sh OK 10.14s
> > 4/5 ndctl:cxl / cxl-create-region.sh OK 18.32s
> > 5/5 ndctl:cxl / security-cxl.sh OK 3.35s
>
> What branch of ndctl do you use?
Yeah, somewhat unfair of me to say "works for me" with patches that had
not yet been posted to the list. That's fixed now with this posting:
http://lore.kernel.org/r/167097752151.1189953.3189708700022130101.stgit@dwillia2-xfh.jf.intel.com
...and this one that is now on the pending branch:
http://lore.kernel.org/r/167053487710.582963.17616889985000817682.stgit@dwillia2-xfh.jf.intel.com
>
> > So, what I suspect is happening is
>
> <-- snip -->
>
> Yes you're right.
>
> > From 93bf2c04cd3a708c73c0e4ad7a4121505a0698da Mon Sep 17 00:00:00 2001
> > From: Dan Williams <dan.j.williams@...el.com>
> > Date: Fri, 9 Dec 2022 13:04:26 -0800
> > Subject: [PATCH] tools/testing/cxl: Prevent cxl_test from confusing production
> > modules
> >
> > The cxl_test machinery builds modified versions of the modules in
> > drivers/cxl/ and intercepts some of their calls to allow cxl_test to
> > inject mock CXL topologies for test.
> >
> > However, if cxl_test attempts the same with production modules,
> > fireworks ensue as Luis discovered [1]. Prevent that scenario by
> > arranging for cxl_test to check for a "watermark" symbol in each of the
> > modules it expects to be modified before the test can run. This turns
> > undefined runtime behavior or crashes into a safer failure to load the
> > cxl_test module.
> >
> > Link: http://lore.kernel.org/r/20221209062919.1096779-1-mcgrof@kernel.org [1]
> > Reported-by: Luis Chamberlain <mcgrof@...nel.org>
> > Signed-off-by: Dan Williams <dan.j.williams@...el.com>
>
> Indeed that fixes the same crash. However that highlights a few other
> issues.
>
> 1) ndcl unit tests still fail once you do get the right driver loaded:
> a) pending branch fails on the first test and stops there
> b) main branch fails at the first test and continues and passes on
> the rest of the tests. What is with the discrepancy?
>
> 2) The instructions on ndctl to use an external module are easily
> misguiding folks on how to use external replacement modules, I'd like
> to suggest a fix below.
>
> Details below.
[..]
>
> Anyway I think we should simplify the ndctl README.md to just use
> INSTALL_MOD_DIR follows, thoughts?
Hmm, but scripts/Makefile.modinst already has this:
INSTALL_MOD_DIR ?= extra
...are you not getting these modules deployed in the "$modules/extra"
directory, or is your distro not priortizing modules in that directory
over the others? Fedora seems to have this policy by default, but it
seems at least Ubuntu does not. That's what led to the discussion of a
modprobe "override" policy in the Troubleshooting section of the README.
My concern with changing the INSTALL_MOD_PATH recommendation is whether
having modules in "$modules/updates" is guaranteed to work in all
distros. Otherwise we're just shuffling which distros need a custom
override workaround.
> But this also raises the question of *if* using ndctl and linux-next
> shoudl one use the main branch or the pending branch? Can there be
> issues with synchronizing ? Or should the main branch always work,
> and the pending should just have the latest and greatest and *can*
> fail?
This falls into the category of a "good problem to have" in the sense
that I never had to worry before about others wanting to reproduce unit
test results this early in the dev cycle, so I appreciate the nudge
here.
As to what to do about it, I am open to suggestions. The typical flow
has some lag between upstream tools/testing/cxl/ changes and when the
corresponding ndctl/cxl changes land. This is because the tool enabling
does not start until after it is clear that the kernel changes are going
to land. After that it's another round of review to settle on the tool
changes. The ndctl/pending branch should usually be up to date by the
time -rc1 arrives. Is that sufficient?
Powered by blists - more mailing lists