[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20220419163859.2228874-1-tony.luck@intel.com>
Date: Tue, 19 Apr 2022 09:38:48 -0700
From: Tony Luck <tony.luck@...el.com>
To: hdegoede@...hat.com, markgross@...nel.org
Cc: tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
dave.hansen@...ux.intel.com, x86@...nel.org, hpa@...or.com,
corbet@....net, gregkh@...uxfoundation.org,
andriy.shevchenko@...ux.intel.com, jithu.joseph@...el.com,
ashok.raj@...el.com, tony.luck@...el.com, rostedt@...dmis.org,
dan.j.williams@...el.com, linux-kernel@...r.kernel.org,
linux-doc@...r.kernel.org, platform-driver-x86@...r.kernel.org,
patches@...ts.linux.dev, ravi.v.shankar@...el.com
Subject: [PATCH v3 00/11] Introduce In Field Scan driver
Longer description of what this does, and why it is useful in the v2 cover
letter here:
https://lore.kernel.org/all/20220407191347.9681-1-jithu.joseph@intel.com/
But the TL;DR version is this driver loads scan test files that can
check whether silicon in a CPU core is still running correctly. It
is expected that these tests would be run several times per day to
catch problems as silicon ages.
I'm posting this update because I missed many major issues when I added
my review tag. So I have a moral obligation to fix up the things that
I missed.
Changes since V2:
Dan Williams (offline):
----------------------
1) Provided the clue to split into a tiny driver that enumerates and
registers the device. Then the IFS driver can attach to that an behave
much more like a normal driver (original idea from Andy Lutomirski,
used for pmem/nvdimms)
2 .. many) Lots more pointers, tips, and general good guidance to make both
the code and commit comments better and easier to understand.
Boris:
-----
1) Add "intel_" prefixes to the two functions moving to wider scope.
Done.
2) Move the declarations from <asm/microcode_intel.h> to <asm/cpu.h>
Done.
3) intel_cpu_signatures_match() is small enough to be "inline".
Done.
Greg:
----
1) Is the firmware already submitted to the linux-firmware project for
inclusion there?
If not, where should a user get it from?
The scan files will be distributed by Intel on Github in much the
same way that microcode is distributed today.
2) > +struct ifs_binary ifs_binary;
Please no static memory. Use the driver model properly which does not
want you to do this at all.
You should not need this at all. If you do, something is wrong as you
are tying the lifecycle of the memory to the code, not to the device.
Moved this (and ifs_test) to dynamic allocation using devm_kzalloc()
and attaching the resulting pointer to the device with dev_set_drvdata().
3) > +static ssize_t details_show(struct device *dev,
> + struct device_attribute *attr,
> + char *buf)
> +{
> + int ret;
> +
> + if (down_trylock(&ifs_sem))
> + return -EBUSY;
Why do you care about locking here at all?
> +
> + ret = sysfs_emit(buf, "%#llx\n", ifs_test.scan_details);
> + up(&ifs_sem);
What are you protecting? The value can change right after the lock is
released, so who cares?
Removed locking from status and details show() functions. Running a test
is synchronous. So:
# echo 3 > run_test
# cat status
# cat details
will give the results of the core 3 test as expected. It is up to the user
to not do dumb things like reading status/details from another process in
parallel with running tests.
4) > + if (!ifs_binary.loaded) {
> + dev_info(&ifs_pdev->dev, "Load scan binary using driver bind interface\n");
Do not allow userspace to spam kernel logs for no reason :(
sysfs files are not "help files" in the kernel.
Spam removed.
5) > +void ifs_sysfs_add(void)
> +{
> + ifs_pdev->dev.groups = plat_ifs_groups;
Why do you have a single global structure?
All instances of the driver for different tests can use the same files
and functions. They use "struct ifs_data *ifsd = dev_get_drvdata(dev);"
to operate on the correct driver instance.
6) > +KernelVersion: 5.19.0
No need for ".0"
Removed.
7) > + For e.g to test cpu5 do echo 5 > /sys/devices/platform/intel_ifs/run_test
So core numbers are different than cpu numbers here? How are users
going to map them?
Added some extra text here to say that tests are per core, but any thread
on the core can be used to run the test. Should I also point people at
/sys/devices/system/cpu/cpu#/topology/thread_siblings_list? It seems
easy for users to get a list of cores with a script like:
$ cores=$(cat /sys/devices/system/cpu/cpu*/topology/thread_siblings_list | sed -e 's/,.*//' | sort -n | uniq)
8) > +Description: Version of loaded IFS binary image.
In what format?
Added "(hexadecimal)". Also added code (and Docs) to print "none" if the load
of the scan file failed.
9) > +Description: echo "intel_ifs" to reload IFS image.
Huh? Why are you using a common sysfs file for this type of attribute?
Please do not do so, make it "reload" or something like that.
Ok. Added a "reload" file like microcode. (Though using driver bind/unbind
also works).
10) > +Description: IFS tunable parameter that user can modify before
> + the scan run if they wish to override default value.
And where are those parameters documented? What are valid values here?
Dropped both the "noirq" and "retry" parameters. I think they now have sane
defaults. If Jithu/Ashok have a good use case, they can send a patch to add
them back.
-Tony
Jithu Joseph (7):
x86/microcode/intel: Expose collect_cpu_info_early() for IFS
platform/x86/intel/ifs: Read IFS firmware image
platform/x86/intel/ifs: Check IFS Image sanity
platform/x86/intel/ifs: Authenticate and copy to secured memory
platform/x86/intel/ifs: Add scan test support
platform/x86/intel/ifs: Add IFS sysfs interface
platform/x86/intel/ifs: add ABI documentation for IFS
Tony Luck (4):
Documentation: In-Field Scan
platform/x86/intel/ifs: Create device for Intel IFS (In Field Scan)
platform/x86/intel/ifs: Add stub driver for In-Field Scan
trace: platform/x86/intel/ifs: Add trace point to track Intel IFS
operations
.../ABI/testing/sysfs-platform-intel-ifs | 39 ++
Documentation/x86/ifs.rst | 101 ++++++
Documentation/x86/index.rst | 1 +
MAINTAINERS | 8 +
arch/x86/include/asm/cpu.h | 18 +
arch/x86/kernel/cpu/intel.c | 32 ++
arch/x86/kernel/cpu/microcode/intel.c | 59 +---
drivers/platform/x86/intel/Kconfig | 1 +
drivers/platform/x86/intel/Makefile | 1 +
drivers/platform/x86/intel/ifs/Kconfig | 16 +
drivers/platform/x86/intel/ifs/Makefile | 5 +
drivers/platform/x86/intel/ifs/core.c | 74 ++++
drivers/platform/x86/intel/ifs/ifs.h | 103 ++++++
.../platform/x86/intel/ifs/intel_ifs_device.c | 50 +++
drivers/platform/x86/intel/ifs/load.c | 265 ++++++++++++++
drivers/platform/x86/intel/ifs/runtest.c | 333 ++++++++++++++++++
drivers/platform/x86/intel/ifs/sysfs.c | 151 ++++++++
include/trace/events/intel_ifs.h | 38 ++
18 files changed, 1243 insertions(+), 52 deletions(-)
create mode 100644 Documentation/ABI/testing/sysfs-platform-intel-ifs
create mode 100644 Documentation/x86/ifs.rst
create mode 100644 drivers/platform/x86/intel/ifs/Kconfig
create mode 100644 drivers/platform/x86/intel/ifs/Makefile
create mode 100644 drivers/platform/x86/intel/ifs/core.c
create mode 100644 drivers/platform/x86/intel/ifs/ifs.h
create mode 100644 drivers/platform/x86/intel/ifs/intel_ifs_device.c
create mode 100644 drivers/platform/x86/intel/ifs/load.c
create mode 100644 drivers/platform/x86/intel/ifs/runtest.c
create mode 100644 drivers/platform/x86/intel/ifs/sysfs.c
create mode 100644 include/trace/events/intel_ifs.h
base-commit: b2d229d4ddb17db541098b83524d901257e93845
--
2.35.1
Powered by blists - more mailing lists