lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZqlUh4kL0wK5BnnY@debian>
Date: Tue, 30 Jul 2024 14:00:55 -0700
From: Fan Ni <nifan.cxl@...il.com>
To: shiju.jose@...wei.com
Cc: linux-edac@...r.kernel.org, linux-cxl@...r.kernel.org,
	linux-acpi@...r.kernel.org, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org, bp@...en8.de, tony.luck@...el.com,
	rafael@...nel.org, lenb@...nel.org, mchehab@...nel.org,
	dan.j.williams@...el.com, dave@...olabs.net,
	jonathan.cameron@...wei.com, dave.jiang@...el.com,
	alison.schofield@...el.com, vishal.l.verma@...el.com,
	ira.weiny@...el.com, david@...hat.com, Vilas.Sridharan@....com,
	leo.duran@....com, Yazen.Ghannam@....com, rientjes@...gle.com,
	jiaqiyan@...gle.com, Jon.Grimm@....com, dave.hansen@...ux.intel.com,
	naoya.horiguchi@....com, james.morse@....com, jthoughton@...gle.com,
	somasundaram.a@....com, erdemaktas@...gle.com, pgonda@...gle.com,
	duenwen@...gle.com, mike.malvestuto@...el.com, gthelen@...gle.com,
	wschwartz@...erecomputing.com, dferguson@...erecomputing.com,
	wbs@...amperecomputing.com, nifan.cxl@...il.com,
	tanxiaofei@...wei.com, prime.zeng@...ilicon.com,
	roberto.sassu@...wei.com, kangkang.shen@...urewei.com,
	wanghuiqiang@...wei.com, linuxarm@...wei.com
Subject: Re: [PATCH v10 00/11] EDAC: Scrub: introduce generic EDAC RAS
 control feature driver + CXL/ACPI-RAS2 drivers

On Fri, Jul 26, 2024 at 05:05:44PM +0100, shiju.jose@...wei.com wrote:
> From: Shiju Jose <shiju.jose@...wei.com>
> 
Hi Shiju,
FYI. Not sure if I missed something, but I hit some issue when compiling
the patch set.

I tried to apply this patch set on top of fixes branch (tag:
cxl-fixes-6.10-rc7, origin/fixes), and compile, it fails with following
erors.

fan@DT ~/c/linux-fixes (fixes)> make -j 16
mkdir -p /home/fan/cxl/linux-fixes/tools/objtool && make O=/home/fan/cxl/linux-fixes subdir=tools/objtool --no-print-directory -C objtool
  INSTALL libsubcmd_headers
  CALL    scripts/checksyscalls.sh
  CC      drivers/ras/acpi_ras2.o
drivers/ras/acpi_ras2.c: In function ‘ras2_update_patrol_scrub_params_cache’:
drivers/ras/acpi_ras2.c:64:37: error: implicit declaration of function ‘FIELD_GET’ [-Werror=implicit-function-declaration]
   64 |         ras2_ctx->min_scrub_cycle = FIELD_GET(RAS2_PATROL_SCRUB_MIN_SCHRS_OUT_MASK,
      |                                     ^~~~~~~~~
drivers/ras/acpi_ras2.c: In function ‘ras2_hw_scrub_set_enabled_bg’:
drivers/ras/acpi_ras2.c:216:50: error: implicit declaration of function ‘FIELD_PREP’ [-Werror=implicit-function-declaration]
  216 |                 ps_sm->params.scrub_params_in |= FIELD_PREP(RAS2_PATROL_SCRUB_SCHRS_IN_MASK,
      |                                                  ^~~~~~~~~~
cc1: all warnings being treated as errors
make[4]: *** [scripts/Makefile.build:244: drivers/ras/acpi_ras2.o] Error 1
make[3]: *** [scripts/Makefile.build:485: drivers/ras] Error 2
make[3]: *** Waiting for unfinished jobs....
make[2]: *** [scripts/Makefile.build:485: drivers] Error 2
make[1]: *** [/home/fan/cxl/linux-fixes/Makefile:1934: .] Error 2
make: *** [Makefile:240: __sub-make] Error 2


It seems we need to include "linux/bitfield.h". Fixed the compiling with
following change:

fan@DT ~/c/linux-fixes (fixes)> git diff
diff --git a/drivers/ras/acpi_ras2.c b/drivers/ras/acpi_ras2.c
index b2381d5b37d0..b4d229e6451f 100644
--- a/drivers/ras/acpi_ras2.c
+++ b/drivers/ras/acpi_ras2.c
@@ -11,6 +11,7 @@
 #include <linux/edac_ras_feature.h>
 #include <linux/platform_device.h>
 #include <acpi/ras2_acpi.h>
+#include <linux/bitfield.h>
 
 #define RAS2_DEV_NUM_RAS_FEATURES      1



Fan

> Previously known as "ras: scrub: introduce subsystem + CXL/ACPI-RAS2 drivers".
> 
> EDAC based Subsystem for controlling RAS Features
> =================================================
> The proposed EDAC based subsystem for controlling RAS features and
> expose the feature's control attributes to the userspace in sysfs.
> Some Examples:
>  - Scrub control
>  - Error Check Scrub (ECS) control
>  - ACPI RAS2 features
>  - ACPI Address Range Scrubbing (ARS)
>  - Post Package Repair (PPR) etc.
> 
> High level design is illustrated in the following diagram.
>  
>          _______________________________________________
>         |   Userspace - Rasdaemon                       |
>         |  ____________                                 |
>         | | RAS CXL    |       _____________            | 
>         | | Err Handler|----->|             |           |
>         | |____________|      | RAS Dynamic |           |
>         |  ____________       | Scrub       |           |
>         | | RAS Memory |----->| Controller  |           |
>         | | Err Handler|      |_____________|           |
>         | |____________|           |                    |
>         |__________________________|____________________|                              
>                                    |
>                                    |
>     _______________________________|______________________________
>    |   Kernel EDAC based SubSystem | for RAS Features Control     |
>    | ______________________________|____________________________  |
>    || EDAC Core          Sysfs EDAC| Bus                        | |
>    ||    __________________________|_______     _____________   | |
>    ||   |/sys/bus/edac/devices/<dev>/scrub/|   | EDAC Device |  | |
>    ||   |/sys/bus/edac/devices/<dev>/ecs*/ |<->| EDAC MC     |  | |
>    ||   |/sys/bus/edac/devices/<dev>/ars/  |   | EDAC Sysfs  |  | |
>    ||   |/sys/bus/edac/devices/<dev>/ppr/  |   | EDAC Module |  | |
>    ||   |__________________________________|   |_____________|  | |
>    ||                               | EDAC Bus                  | |
>    ||               Get             |                           | |
>    ||    __________ Feature's       |             __________    | |
>    ||   |          |Descs  _________|______      |          |   | |
>    ||   |EDAC Scrub|<-----|    EDAC RAS    |---->| EDAC ARS |   | |
>    ||   |__________|      |Control Feature |     |__________|   | |
>    ||    __________       |    Driver      |      __________    | |
>    ||   |          |<-----|________________|---->|          |   | |
>    ||   |EDAC ECS  |   Register RAS | Features   | EDAC PPR |   | |
>    ||   |__________|                |            |__________|   | |
>    ||         ______________________|___________________        | |
>    ||_________|_____________|_____________|____________|________| |
>    |   _______|____    _____|______   ____|______   ___|_____     |
>    |  |            |  | CXL Mem   |  |           | |         |    |
>    |  | ACPI RAS2  |  | Driver    |  | ACPI ARS  | | PPR     |    |
>    |  | Driver     |  | Scrub,ECS |  | Driver    | | Driver  |    |
>    |  |____________|  |___________|  |___________| |_________|    |
>    |        |              |              |           |           |
>    |________|______________|______________|___________|___________|
>             |              |              |           |          
>      _______|______________|______________|___________|___________
>     |     __|______________|_ ____________|___________|_____      |
>     |    |                                                  |     |
>     |    |            Platform HW and Firmware              |     |
>     |    |__________________________________________________|     |
>     |_____________________________________________________________|                             
> 
> 1. EDAC Features components - Create feature specific descriptors.
> 2. EDAC RAS Feature driver - Get feature's attr descriptors from the 
>    EDAC RAS feature component and registers device's RAS features with
>    EDAC bus and expose the feature's sysfs attributes under the sysfs
>    EDAC bus.
> 3. RAS dynamic scrub controller - Userspace sample module added in the
>    rasdaemon to start scrubbing when excess number of related errors
>    are reported in a short span of time.
> 
> The added EDAC feature specific components (e.g. EDAC scrub, EDAC ECS,
> EDAC PPR etc) do callbacks to  the parent driver (e.g. CXL driver,
> ACPI RAS driver etc) for the controls rather than just letting the
> caller deal with it because of the following reasons.
> 1. Enforces a common API across multiple implementations can do that
>    via review, but that's not generally gone well in the long run for
>    subsystems that have done it (several have later moved to callback
>    and feature list based approaches).
> 2. Gives a path for 'intercepting' in the EDAC feature driver.
>    An example for this is that we could intercept PPR repair calls
>    and sanity check that the memory in question is offline before
>    passing back to the underlying code.  Sure we could rely on doing
>    that via some additional calls from the parent driver, but the
>    ABI will get messier.
> 3. (Speculative) we may get in kernel users of some features in the
>    long run.
> 
> More details of the common RAS features are described in the following
> sections.
> 
> Memory Scrubbing
> ================
> Increasing DRAM size and cost has made memory subsystem reliability
> an important concern. These modules are used where potentially
> corrupted data could cause expensive or fatal issues. Memory errors are
> one of the top hardware failures that cause server and workload crashes.
> 
> Memory scrub is a feature where an ECC engine reads data from
> each memory media location, corrects with an ECC if necessary and
> writes the corrected data back to the same memory media location.
> 
> The memory DIMMs could be scrubbed at a configurable rate to detect
> uncorrected memory errors and attempts to recover from detected memory
> errors providing the following benefits.
> - Proactively scrubbing memory DIMMs reduces the chance of a correctable
>   error becoming uncorrectable.
> - Once detected, uncorrected errors caught in unallocated memory pages are
>   isolated and prevented from being allocated to an application or the OS.
> - The probability of software/hardware products encountering memory
>   errors is reduced.
> Some details of background can be found in Reference [5].
> 
> There are 2 types of memory scrubbing,
> 1. Background (patrol) scrubbing of the RAM whilest the RAM is otherwise
>    idle.
> 2. On-demand scrubbing for a specific address range/region of memory.
> 
> There are several types of interfaces to HW memory scrubbers
> identified such as ACPI NVDIMM ARS(Address Range Scrub), CXL memory
> device patrol scrub, CXL DDR5 ECS, ACPI RAS2 memory scrubbing.
> 
> The scrub control varies between different memory scrubbers. To allow
> for standard userspace tooling there is a need to present these controls
> with a standard ABI.
> 
> Introduce generic memory EDAC scrub control which allows user to
> control underlying scrubbers in the system via generic sysfs scrub
> control interface.
> 
> Use case of common scrub control feature
> ========================================
> 1. There are several types of interfaces to HW memory scrubbers identified
>    such as ACPI NVDIMM ARS(Address Range Scrub), CXL memory device patrol
>    scrub, CXL DDR5 ECS, ACPI RAS2 memory scrubbing features and software
>    based memory scrubber(discussed in the community Reference [5]).
>    Also some scrubbers support controlling (background) patrol scrubbing
>    (ACPI RAS2, CXL) and/or on-demand scrubbing(ACPI RAS2, ACPI ARS).
>    However the scrub controls varies between memory scrubbers. Thus there
>    is a requirement for a standard generic sysfs scrub controls exposed
>    to the userspace for the seamless control of the HW/SW scrubbers in
>    the system by admin/scripts/tools etc.
> 2. Scrub controls in user space allow the user to disable the scrubbing
>    in case disabling of the background patrol scrubbing or changing the
>    scrub rate are needed for other purposes such as performance-aware
>    operations which requires the background operations to be turned off
>    or reduced.
> 3. Allows to perform on-demand scrubbing for specific address range if
>    supported by the scrubber.
> 4. User space tools controls scrub the memory DIMMs regularly at a
>    configurable scrub rate using the sysfs scrub controls discussed help,
>    - to detect uncorrectable memory errors early before user accessing memory,
>      which helps to recover the detected memory errors.
>    - reduces the chance of a correctable error becoming uncorrectable.
> 5. Policy control for hotplugged memory. There is not necessarily a system
>    wide bios or similar in the loop to control the scrub settings on a CXL
>    device that wasn't there at boot. What that setting should be is a policy
>    decision as we are trading of reliability vs performance - hence it should
>    be in control of userspace. As such, 'an' interface is needed. Seems more
>    sensible to try and unify it with other similar interfaces than spin
>    yet another one.
> 
> The draft version of userspace code for dynamic scrub control, based
> on frequency of memory errors reported to the userspace, is added in
> rasdaemon and enabled, tested for CXL device based patrol scrubbing feature
> and ACPI RAS2 based scrubbing feature.
> 
> https://github.com/shijujose4/rasdaemon/tree/scrub_control_6_june_2024
> 
> Comparison of scrubbing features
> ================================
>  ................................................................
>  .              .   ACPI    . CXL patrol.  CXL ECS  .  ARS      .
>  .  Name        .   RAS2    . scrub     .           .           .
>  ................................................................
>  .              .           .           .           .           .
>  . On-demand    . Supported . No        . No        . Supported .
>  . Scrubbing    .           .           .           .           .
>  .              .           .           .           .           .  
>  ................................................................
>  .              .           .           .           .           .
>  . Background   . Supported . Supported . Supported . No        .
>  . scrubbing    .           .           .           .           .
>  .              .           .           .           .           .
>  ................................................................
>  .              .           .           .           .           .
>  . Mode of      . Scrub ctrl. per device. per memory.  Unknown  .
>  . scrubbing    . per NUMA  .           . media     .           .
>  .              . domain.   .           .           .           .
>  ................................................................
>  .              .           .           .           .           . 
>  . Query scrub  . Supported . Supported . Supported . Supported .       
>  . capabilities .           .           .           .           .
>  .              .           .           .           .           .
>  ................................................................
>  .              .           .           .           .           . 
>  . Setting      . Supported . No        . No        . Supported .       
>  . address range.           .           .           .           .
>  .              .           .           .           .           .
>  ................................................................
>  .              .           .           .           .           . 
>  . Setting      . Supported . Supported . No        . No        .       
>  . scrub rate   .           .           .           .           .
>  .              .           .           .           .           .
>  ................................................................
>  .              .           .           .           .           . 
>  . Unit for     . Not       . in hours  . No        . No        .       
>  . scrub rate   . Defined   .           .           .           .
>  .              .           .           .           .           .
>  ................................................................
>  .              . Supported .           .           .           .
>  . Scrub        . on-demand . No        . No        . Supported .
>  . status/      . scrubbing .           .           .           .
>  . Completion   . only      .           .           .           .
>  ................................................................
>  . UC error     .           .CXL general.CXL general. ACPI UCE  .
>  . reporting    . Exception .media/DRAM .media/DRAM . notify and.
>  .              .           .event/media.event/media. query     .
>  .              .           .scan?      .scan?      . ARS status.
>  ................................................................
>  .              .           .           .           .           .      
>  . Clear UC     .  No       . No        .  No       . Supported .
>  . error        .           .           .           .           .
>  .              .           .           .           .           .  
>  ................................................................
>  .              .           .           .           .           .
>  . Translate    . No        . No        . No        . Supported .
>  . *(1)SPA to   .           .           .           .           .
>  . *(2)DPA      .           .           .           .           .  
>  ................................................................
>  .              .           .           .           .           .
>  . Error inject . No        . Can inject. No        . Supported .
>  .              .           . poison for.           .           .
>  .              .           . CXL       .           .           .  
>  ................................................................
> *(1) - SPA - System Physical Address. See section 9.19.7.8
>        Function Index 5 - Translate SPA of ACPI spec r6.5.  
> *(2) - DPA - Device Physical Address. See section 9.19.7.8
>        Function Index 5 - Translate SPA of ACPI spec r6.5.  
> 
> CXL Scrubbing features
> ======================
> Add support for control CXL patrol scrubber and ACPI RAS2 HW based memory
> patrol scrubber and register with the EDAC scrub to expose the scrub
> controls to the userspace tool.
> 
> CXL spec r3.1 section 8.2.9.9.11.1 describes the memory device patrol scrub
> control feature. The device patrol scrub proactively locates and makes
> corrections to errors in regular cycle. The patrol scrub control allows the
> request to configure patrol scrubber's input configurations.
> 
> The patrol scrub control allows the requester to specify the number of
> hours in which the patrol scrub cycles must be completed, provided that
> the requested number is not less than the minimum number of hours for the
> patrol scrub cycle that the device is capable of. In addition, the patrol
> scrub controls allow the host to disable and enable the feature in case
> disabling of the feature is needed for other purposes such as
> performance-aware operations which require the background operations to be
> turned off.
> 
> The Error Check Scrub (ECS) is a feature defined in JEDEC DDR5 SDRAM
> Specification (JESD79-5) and allows the DRAM to internally read, correct
> single-bit errors, and write back corrected data bits to the DRAM array
> while providing transparency to error counts.
> 
> The DDR5 device contains number of memory media FRUs per device. The
> DDR5 ECS feature and thus the ECS control driver supports configuring
> the ECS parameters per FRU.
> 
> ACPI RAS2 Hardware-based Memory Scrubbing
> =========================================
> ACPI spec 6.5 section 5.2.21 ACPI RAS2 describes ACPI RAS2 table
> provides interfaces for platform RAS features and supports independent
> RAS controls and capabilities for a given RAS feature for multiple
> instances of the same component in a given system.
> Memory RAS features apply to RAS capabilities, controls and operations
> that are specific to memory. RAS2 PCC sub-spaces for memory-specific RAS
> features have a Feature Type of 0x00 (Memory).
> 
> The platform can use the hardware-based memory scrubbing feature to expose
> controls and capabilities associated with hardware-based memory scrub
> engines. The RAS2 memory scrubbing feature supports following as per spec,
>  - Independent memory scrubbing controls for each NUMA domain, identified
>    using its proximity domain.
>    Note: However AmpereComputing has single entry repeated as they have
>          centralized controls.
>  - Provision for background (patrol) scrubbing of the entire memory system,
>    as well as on-demand scrubbing for a specific region of memory.
> 
> ACPI Address Range Scrubbing(ARS)
> ================================
> ARS allows the platform to communicate memory errors to system software.
> This capability allows system software to prevent accesses to addresses
> with uncorrectable errors in memory. ARS functions manage all NVDIMMs
> present in the system. Only one scrub can be in progress system wide
> at any given time.
> Following functions are supported as per the specification.
> 1. Query ARS Capabilities for a given address range, indicates platform
>    supports the ACPI NVDIMM Root Device Unconsumed Error Notification.
> 2. Start ARS triggers an Address Range Scrub for the given memory range.
>    Address scrubbing can be done for volatile memory, persistent memory,
>    or both.
> 3. Query ARS Status command allows software to get the status of ARS,  
>    including the progress of ARS and ARS error record.
> 4. Clear Uncorrectable Error.
> 5. Translate SPA
> 6. ARS Error Inject etc.
> Note: Support for ARS is not added in this series because to reduce the
> line of code for review and could be added after initial code is merged. 
> We'd like feedback on whether this is of interest to ARS community?
> 
> Series adds,
> 1. Generic EDAC RAS feature driver, EDAC scrub driver, EDAC ECS driver
>    supports memory scrub control, ECS control and other RAS features
>    in the system.
> 2. Support for CXL feature mailbox commands, which is used by
>    CXL device scrubbing features. 
> 3. CXL scrub driver supporting patrol scrub control (device and
>    region based).
> 4. CXL ECS driver supporting ECS control feature.
> 5. ACPI RAS2 driver adds OS interface for RAS2 communication through
>    PCC mailbox and extracts ACPI RAS2 feature table (RAS2) and
>    create platform device for the RAS memory features, which binds
>    to the memory ACPI RAS2 driver.
> 7. Memory ACPI RAS2 driver gets the PCC subspace for communicating
>    with the ACPI compliant platform supports ACPI RAS2. Add callback
>    functions and registers with EDAC scrub to support user to
>    control the HW patrol scrubbers exposed to the kernel via the
>    ACPI RAS2 table.
> 
> The CXL specific scrub and ECS features are supported in
> https://gitlab.com/qemu-project/qemu.git
> 
> Open Questions based on feedbacks from the community:
> 1. Leo: Standardize unit for scrub rate, for example ACPI RAS2 does not define
>    unit for the scrub rate. RAS2 clarification needed. 
> 2. Jonathan: Any need for discoverability of capability to scan different regions,
>    such as global PA space to the userspace. Left as future extension.
> 3. Jiaqi:
>    - STOP_PATROL_SCRUBBER from RAS2 must be blocked and, must not be exposed to
>      OS/userspace. Stopping patrol scrubber is unacceptable for platform where
>      OEM has enabled patrol scrubber, because the patrol scrubber is a key part
>      of logging and is repurposed for other RAS actions.
>    If the OEM does not want to expose this control, they should lock it down so the
>    interface is not exposed to the OS. These features are optional afterall.
>    - "Requested Address Range"/"Actual Address Range" (region to scrub) is a
>       similarly bad thing to expose in RAS2.
>    If the OEM does not want to expose this, they should lock it down so the
>    interface is not exposed to the OS. These features are optional afterall.
> 4. Borislav: 
>    - How the scrub control exposed to the userspace will be used?
>      POC added in rasdaemon with dynamic scrub control for CXL memory media
>      errors and memory errors reported to the userspace.
>      https://github.com/shijujose4/rasdaemon/tree/scrub_control_6_june_2024
>    - Is the scrub interface is sufficient for the use cases?
>    - Who is going to use scrub controls tools/admin/scripts?
>      1) Rasdaemon for dynamic control
>      2) Udev script for more static 'defaults' on hotplug etc.
> 
> References:
> 1. ACPI spec r6.5 section 5.2.21 ACPI RAS2.
> 2. ACPI spec r6.5 section 9.19.7.2 ARS.
> 3. CXL spec  r3.1 8.2.9.9.11.1 Device patrol scrub control feature
> 4. CXL spec  r3.1 8.2.9.9.11.2 DDR5 ECS feature
> 5. Background information about kernel support for memory scan, memory
>    error detection and ACPI RASF.
>    https://lore.kernel.org/all/20221103155029.2451105-1-jiaqiyan@google.com/
> 6. Discussions on RASF:
>    https://lore.kernel.org/lkml/20230915172818.761-1-shiju.jose@huawei.com/#r 
> 
> Changes
> =======
> v9 -> v10:
> 1. Feedback from Mauro Carvalho Chehab:
>    - Changes suggested in EDAC RAS feature driver.
>      use uppercase for enums, if else to switch-case, documentation for
>      static scrub and ecs init functions etc.
>    - Changes suggested in EDAC scrub.
>      unit of scrub cycle hour to seconds.
>      attribute node cycle_in_hours_avaiable to min_cycle_duration and 
>      max_cycle_duration.
>      attribute node cycle_in_hours to current_cycle_duration.
>      Use base 0 for kstrtou64() and kstrtol() functions.
>      etc.
>    - Changes suggested in EDAC ECS.
>      uppercase for enums
>      add ABI documentation. etc
>         
> 2. Feedback from Fan:
>    - Changes suggested in EDAC RAS feature driver.
>      use uppercase for enums, change if...else to switch-case. 
>      some optimization in edac_ras_dev_register() function
>      add missing goto free_ctx
>    - Changes suggested in the code for feature commands.  
>    - CXL driver scrub and ECS code
>      use uppercase for enums, fix typo, use enum type for mode
>      fix lonf lines etc.
>        
> v8 -> v9:
> 1. Feedback from Borislav:
>    - Add scrub control driver to the EDAC on feedback from Borislav.
>    - Changed DEVICE_ATTR_..() static.
>    - Changed the write permissions for scrub control sysfs files as
>      root-only.
> 2. Feedback from Fan:
>    - Optimized cxl_get_feature() function by using min() and removed
>      feat_out_min_size.
>    - Removed unreached return from cxl_set_feature() function.
>    - Changed the term  "rate" to "cycle_in_hours" in all the
>      scrub control code.
>    - Allow cxl_mem_probe() continue if cxl_mem_patrol_scrub_init() fail,
>      with just a debug warning.
>       
> 3. Feedback from Jonathan:
>    - Removed patch __free() based cleanup function for acpi_put_table.
>      and added fix in the acpi ras2 driver.
> 
> 4. Feedback from Dan Williams:
>    - Allow cxl_mem_probe() continue if cxl_mem_patrol_scrub_init() fail,
>      with just a debug warning.
>    - Add support for CXL region based scrub control.
> 
> 5. Feedback from Daniel Ferguson on RAS2 drivers:
>     In the ACPI RAS2 driver,
>   - Incorporated the changes given for clearing error reported.
>   - Incorporated the changes given for check the Set RAS Capability
>     status and return an appropriate error.
>     In the RAS2 memory driver,
>   - Added more checks for start/stop bg and on-demand scrubbing
>     so that addr range in cache do not get cleared and restrict
>     permitted operations during scrubbing.
> 
> v7 -> v8:
> 1. Add more detailed cover letter and add info for basic analysis
>    of ACPI ARS for comment from Dan Williams.
> 2. Changed file name etc from ras2 to acpi_ras2 in memory ACPI RAS2
>    driver for comment from Boris.
> 3. Add documents for usage for comment from Jonathan.
> 4. Changed logic in memory/acpi_ras2.c for enable background
>    scrubbing to allow setting the scrub rate.
> 5. Merged memory/acpi_ras2_common.c with memory/acpi_ras2.c and
>    obselete code, suggested by Jonathan.  
> 6. Initial optimizations and cleanup especially in the memory/acpi_ras2.
> 7. Removed CXL ECS support for time being. 
> 8. Removed support for region based scrub control from the scrub
>    subsytem, which was needed for the CXL ECS, can be added later
>    if required.
> 9. Fixed the format of few comments and a definition in CXL feature
>     code for the feedbacks from Fan.
> 11. Jonathan done several optimizations, interface changes and
>     cleanups all over the code.
> 12. Fixes for feedbacks from Daniel Ferguson(Amperecomputing)
>     for RAS2.
> 13.  Workaround for a RAS2 case of only one actual controller as
>      reported by Daniel Ferguson(AmpereComputing) in their hardware.
> 14. Feedback from Yazen, move the common scrub and ras2 changes
>     under /drivers/ras/.
> 15. Drop patch ACPICA: ACPI 6.5: Add support for RAS2 table because 
>     Rafael queued the patch.
>     https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/commit/?h=bleeding-edge&id=9726d821f88e284ecd998b76ae5f2174721cd9dc
>  
> v6 -> v7:
> 1. Main changes for comments from Jonathan, Thanks.
> 1.1. CXL
>  - Changes for deal with small mail box and supporting multipart
>    feature data transfers.
>  - Provide more specific parameters to mbox supported/get/set features
>    interface functions.
>  - kvmalloc -> kmalloc in CXL scrub mem allocation for feature commands.
>  - Changed the way using __free(kfree)
>  - Removed readback and verify for setting CXL scrub patrol and ECS
>    parameters. Could be added later if needed.
>  - In is_visible() callback functions for scrub control sysfs attrs
>    changed to writeback the default attribute mode value instead of
>    setting per attrs.
>  - Add documentation for sysfs interfaces for CXL ECS scrub control. 
> 1.2. RAS2
>  - In rasf common code, rename rasf to ras2 because RASF seems obselete.
>  - Replace pr_* with dev_* log function calls from ACPI RAS2 and
>    memory RAS2 drivers.
>  - In rasf common code, rename rasf to ras2.
>  - Removed including unnecessary .h file from memory RAS2 driver.
>  - In is_visible() callback functions for scrub control sysfs attrs
>    changed to writeback the default attribute mode value instead of
>    setting per attribute.
> 
> 2. Changes for comments from Fan, Thanks.
>  - Add debug message if cxl patrol scrub and ecs init function
>    calls fail.
> 3. Updated cover letter for feedback from Dan Williams. 
>    
> v5 -> v6:
> 1. Changes for comments from Davidlohr, Thanks.
>  - Update CXL feature code based on spec 3.1.
>  - attrb -> attr
>  - Use enums with default counting.  
> 2. Rebased to the latest kernel.
> 
> v4 -> v5:
> 1. Following are the main changes made based on the feedback from Dan Williams on v4.
> 1.1. In the scrub subsystem the common scrub control attributes are statically defined
>      instead of dynamically created.
> 1.2. Add scrub subsystem support externally defined attribute group.
>      Add CXL ECS driver define ECS specific attribute group and pass to
> 	 the scrub subsystem.
> 1.3. Move cxl_mem_ecs_init() to cxl/core/region.c so that the CXL region_id
>      is used in the registration with the scrub subsystem. 	 
> 1.4. Add previously posted RASF common and RAS2 patches to this scrub series.
> 	 
> 2. Add support for the 'enable_background_scrub' attribute
>    for RAS2, on request from Bill Schwartz(wschwartz@...erecomputing.com).
> 
> v3 -> v4:
> 1. Fixes for the warnings/errors reported by kernel test robot.
> 2. Add support for reading the 'enable' attribute of CXL patrol scrub.
> 
> Changes
> v2 -> v3:
> 1. Changes for comments from Davidlohr, Thanks.
>  - Updated cxl scrub kconfig
>  - removed usage of the flag is_support_feature from
>    the function cxl_mem_get_supported_feature_entry().
>  - corrected spelling error.
>  - removed unnecessary debug message.
>  - removed export feature commands to the userspace.
> 2. Possible fix for the warnings/errors reported by kernel
>    test robot.
> 3. Add documentation for the common scrub configure attributes.
> 
> v1 -> v2:
> 1. Changes for comments from Dave Jiang, Thanks.
>  - Split patches.
>  - reversed xmas tree declarations.
>  - declared flags as enums.
>  - removed few unnecessary variable initializations.
>  - replaced PTR_ERR_OR_ZERO() with IS_ERR() and PTR_ERR().
>  - add auto clean declarations.
>  - replaced while loop with for loop.
>  - Removed allocation from cxl_get_supported_features() and
>    cxl_get_feature() and make change to take allocated memory
>    pointer from the caller.
>  - replaced if/else with switch case.
>  - replaced sprintf() with sysfs_emit() in 2 places.
>  - replaced goto label with return in few functions.
> 2. removed unused code for supported attributes from ecs.
> 3. Included following common patch for scrub configure driver
>    to this series.
>    "memory: scrub: Add scrub driver supports configuring memory scrubbers
>     in the system"
> 
> 
> Jonathan Cameron (1):
>   platform: Add __free() based cleanup function for platform_device_put
> 
> Shiju Jose (10):
>   EDAC: Add generic EDAC RAS control feature driver
>   EDAC: Add EDAC scrub control driver
>   EDAC: Add EDAC ECS control driver
>   cxl/mbox: Add GET_SUPPORTED_FEATURES mailbox command
>   cxl/mbox: Add GET_FEATURE mailbox command
>   cxl/mbox: Add SET_FEATURE mailbox command
>   cxl/memscrub: Add CXL memory device patrol scrub control feature
>   cxl/memscrub: Add CXL memory device ECS control feature
>   ACPI:RAS2: Add ACPI RAS2 driver
>   ras: scrub: ACPI RAS2: Add memory ACPI RAS2 driver
> 
>  Documentation/ABI/testing/sysfs-edac-ecs   |  85 ++
>  Documentation/ABI/testing/sysfs-edac-scrub |  76 ++
>  Documentation/scrub/edac-scrub.rst         | 115 +++
>  drivers/acpi/Kconfig                       |  10 +
>  drivers/acpi/Makefile                      |   1 +
>  drivers/acpi/ras2.c                        | 391 +++++++++
>  drivers/cxl/Kconfig                        |  19 +
>  drivers/cxl/core/Makefile                  |   1 +
>  drivers/cxl/core/mbox.c                    | 139 ++++
>  drivers/cxl/core/memscrub.c                | 873 +++++++++++++++++++++
>  drivers/cxl/core/region.c                  |   6 +
>  drivers/cxl/cxlmem.h                       | 129 +++
>  drivers/cxl/mem.c                          |   4 +
>  drivers/edac/Makefile                      |   1 +
>  drivers/edac/edac_ecs.c                    | 396 ++++++++++
>  drivers/edac/edac_ras_feature.c            | 187 +++++
>  drivers/edac/edac_scrub.c                  | 333 ++++++++
>  drivers/ras/Kconfig                        |  10 +
>  drivers/ras/Makefile                       |   1 +
>  drivers/ras/acpi_ras2.c                    | 415 ++++++++++
>  include/acpi/ras2_acpi.h                   |  59 ++
>  include/linux/edac_ras_feature.h           | 132 ++++
>  include/linux/platform_device.h            |   1 +
>  23 files changed, 3384 insertions(+)
>  create mode 100644 Documentation/ABI/testing/sysfs-edac-ecs
>  create mode 100644 Documentation/ABI/testing/sysfs-edac-scrub
>  create mode 100644 Documentation/scrub/edac-scrub.rst
>  create mode 100755 drivers/acpi/ras2.c
>  create mode 100644 drivers/cxl/core/memscrub.c
>  create mode 100755 drivers/edac/edac_ecs.c
>  create mode 100755 drivers/edac/edac_ras_feature.c
>  create mode 100755 drivers/edac/edac_scrub.c
>  create mode 100644 drivers/ras/acpi_ras2.c
>  create mode 100644 include/acpi/ras2_acpi.h
>  create mode 100755 include/linux/edac_ras_feature.h
> 
> -- 
> 2.34.1
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ