[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <663e55c59d9d_3d7b429475@dwillia2-mobl3.amr.corp.intel.com.notmuch>
Date: Fri, 10 May 2024 10:13:41 -0700
From: Dan Williams <dan.j.williams@...el.com>
To: Borislav Petkov <bp@...en8.de>, Dan Williams <dan.j.williams@...el.com>
CC: Jonathan Cameron <Jonathan.Cameron@...wei.com>, Shiju Jose
<shiju.jose@...wei.com>, "linux-cxl@...r.kernel.org"
<linux-cxl@...r.kernel.org>, "linux-acpi@...r.kernel.org"
<linux-acpi@...r.kernel.org>, "linux-mm@...ck.org" <linux-mm@...ck.org>,
"dave@...olabs.net" <dave@...olabs.net>, "dave.jiang@...el.com"
<dave.jiang@...el.com>, "alison.schofield@...el.com"
<alison.schofield@...el.com>, "vishal.l.verma@...el.com"
<vishal.l.verma@...el.com>, "ira.weiny@...el.com" <ira.weiny@...el.com>,
"linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"david@...hat.com" <david@...hat.com>, "Vilas.Sridharan@....com"
<Vilas.Sridharan@....com>, "leo.duran@....com" <leo.duran@....com>,
"Yazen.Ghannam@....com" <Yazen.Ghannam@....com>, "rientjes@...gle.com"
<rientjes@...gle.com>, "jiaqiyan@...gle.com" <jiaqiyan@...gle.com>,
"tony.luck@...el.com" <tony.luck@...el.com>, "Jon.Grimm@....com"
<Jon.Grimm@....com>, "dave.hansen@...ux.intel.com"
<dave.hansen@...ux.intel.com>, "rafael@...nel.org" <rafael@...nel.org>,
"lenb@...nel.org" <lenb@...nel.org>, "naoya.horiguchi@....com"
<naoya.horiguchi@....com>, "james.morse@....com" <james.morse@....com>,
"jthoughton@...gle.com" <jthoughton@...gle.com>, "somasundaram.a@....com"
<somasundaram.a@....com>, "erdemaktas@...gle.com" <erdemaktas@...gle.com>,
"pgonda@...gle.com" <pgonda@...gle.com>, "duenwen@...gle.com"
<duenwen@...gle.com>, "mike.malvestuto@...el.com"
<mike.malvestuto@...el.com>, "gthelen@...gle.com" <gthelen@...gle.com>,
"wschwartz@...erecomputing.com" <wschwartz@...erecomputing.com>,
"dferguson@...erecomputing.com" <dferguson@...erecomputing.com>,
"wbs@...amperecomputing.com" <wbs@...amperecomputing.com>,
"nifan.cxl@...il.com" <nifan.cxl@...il.com>, tanxiaofei
<tanxiaofei@...wei.com>, "Zengtao (B)" <prime.zeng@...ilicon.com>,
"kangkang.shen@...urewei.com" <kangkang.shen@...urewei.com>, wanghuiqiang
<wanghuiqiang@...wei.com>, Linuxarm <linuxarm@...wei.com>, Greg Kroah-Hartman
<gregkh@...uxfoundation.org>, Jean Delvare <jdelvare@...e.com>, Guenter Roeck
<linux@...ck-us.net>, Dmitry Torokhov <dmitry.torokhov@...il.com>
Subject: Re: [RFC PATCH v8 01/10] ras: scrub: Add scrub subsystem
Borislav Petkov wrote:
> On Thu, May 09, 2024 at 03:59:29PM -0700, Dan Williams wrote:
> > No, at a minimum that's a layering violation. This is a generic library
> > facility that should not care if it is being called for a CXL device or
> > an ACPI device.
>
> Really?
Yes.
> Because this looks like creating a subsystem for those two newfangled
> scrubbing functionalities which will be present in CXL devices and
> by that ACPI RAS2 thing.
>
> Because we have a *lot* of hw scrubbing functionality already. Just do:
>
> git grep "scrub"
>
> Some of it controls hw scrubbing. If this is a generic facility, does
> this mean that all those older scrubbers should be converted to it?
>
> Or is this thing going to support only new stuff? I.e., we want to
> disable our scrubbers when doing performance-sensitive workloads and
> reenable them after that but we don't care about old systems?
>
> And all that other bla about controlling scrubbers from userspace.
>
> So which is it?
In fact this question matches my reaction to the last posting [1], and
led to a much improved cover letter and the "Comparison of scrubbing
features". To your point there are scrub capabilities already in the
kernel and we would need to make a decision about what to do about them.
I called out NVDIMM ARS as one example and am open to exploring
converting that to the common scrub ABI, but not block this proposal in
the meantime.
For me the proposal can be boiled down to, "here we (kernel community)
are again with 2 new scrub interfaces to add to the kernel. Lets step
back, define a common ABI for ACPI RAS 2 and CXL to stop the
proliferation of scrub ABIs, and then make a decision about when/whether
to integrate legacy scrub facilities into this new interface".
[1]: http://lore.kernel.org/r/65d6936952764_1138c7294e@dwillia2-xfh.jf.intel.com.notmuch
> > I think it works for x86 drivers because the functionality in those
> > modules is wholly contained within that one module. This scrub module is
> > a service library for other modules.
>
> Well, you have that thing in EDAC. edac_core.ko is that service module
> and the chipset-specific drivers - at least on x86 - use a match_id to
> load only on the systems they should load on.
Which is exactly the same mechanism being defined here. scrub_core.ko is
a service module that would only be requested by an ACPI module or a CXL
module after one of those loads based on their match_id.
> If I had a BIOS table which had "EDAC" in it, I won't load edac_core.ko
> either but there isn't one.
>
> > It is functionally the wrong place to do the check. When module_init()
> > fails it causes not only the current module to be unloaded but any
> > dependent modules will also fail to load.
>
> See above. I'm under the assumption that this is using two methods to
> detect scrubbing functionality.
The scrub_core, like edac_core, has no method to detect scrubbing
facility, it is simply a passive library waiting for the first
scrub_device_register() call.
> > Let's take an example of the CXL driver wanting to register with this
> > scrub interface to support the capability that *might* be available on
> > some CXL devices. The cxl_pci.ko module, that houses cxl_pci_driver,
> > grows a call to scrub_device_register(). That scrub_device_register()
> > call is statically present in cxl_pci.ko so that when cxl_pci.ko loads
> > symbol resolution requires scrub.ko to load.
> >
> > Neither of those modules (cxl_pci.ko or scrub.ko) load automatically.
> > Either udev loads cxl_pci.ko because it sees a device that matches
> > cxl_mem_pci_tbl, or the user manually insmods those modules because they
> > think they know better. No memory wasted unless the user explicitly asks
> > for memory to be wasted.
>
> The scrub.ko goes and asks the system: "Do you have a CXL device with
> scrubbing functionality?" "Yes: load." "No: ok, won't."
Yeah, that's backwards. CXL enumeration belongs in the CXL driver and
the CXL driver is fully responsible for deciding when to incur the costs
of loading scrub_core.
> > If no CXL devices in the system have scrub capabilities, great, then
> > scrub_device_register() will never be called.
> >
> > Now, if memory_scrub_control_init() did its own awkward and redundant
> > CXL scan, and fails with "no CXL scrub capable devices found" it would
> > not only block scrub.ko from loading, but also cxl_pci.ko since
> > cxl_pci.ko needs to resolve that symbol to load.
>
> Why would it fail the scan?
cxl_pci.ko loads based on match_id and cxl_pci_probe() enumerates device
capabilities. My interpretation of your feedback is that
memory_scrub_control_init() should duplicate that cxl_pci_probe()
enumeration?
Assume that it does and memory_scrub_control_init() finds no scrub
facilities in any CXL devices and fails memory_scrub_control_init(). Any
module that links to scrub_device_register() will also fail to load
because module symbol resolution depends on all modules completing init.
> Isn't this fancy GET_SUPPORTED_FEATURES command giving you all info you
> need?
Sure, but that's a driver-probe-time facility, not a module_init-time
facility.
> > Lastly I think drivers based on ACPI tables are awkward. They really
> > need to have an ACPI device to attach so that typical automatic Linux
> > module loading machinery can be used. The fact this function is a
> > subsys_initcall() is a red-flag since nothing should be depending on the
> > load order of a little driver to control scrub parameters.
>
> Yeah, it is becoming a mess before it has even started.
I assume you do not consider edac_core a mess?
Now, the question of how many legacy scrub interfaces should be
considered in this design out of the gate is a worthwhile discussion. I
am encouraged that this ABI is at least trying to handle more than 1
backend, which makes me feel better that adding a 3rd and 4th might not
be prohibitive.
> So I don't mind if such drivers get loaded as long as doing this is the
> best we can do given the situation. What gets me up the palms, as they
> say in .de, is "just because" and "look, the others do it too."
Which matches what I reacted to on the last posting:
"Maybe it is self evident to others, but for me there is little in these
changelogs besides 'mechanism exists, enable it'"
..and to me that feedback was taken to heart with much improved
changelogs in this new posting.
This init time feature probing discussion feels like it was born from a
micommunication / misunderstanding.
Powered by blists - more mailing lists