lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250127100617.00005e77@huawei.com>
Date: Mon, 27 Jan 2025 10:06:17 +0000
From: Jonathan Cameron <Jonathan.Cameron@...wei.com>
To: Dan Williams <dan.j.williams@...el.com>
CC: <shiju.jose@...wei.com>, <linux-edac@...r.kernel.org>,
	<linux-cxl@...r.kernel.org>, <linux-acpi@...r.kernel.org>,
	<linux-mm@...ck.org>, <linux-kernel@...r.kernel.org>, <bp@...en8.de>,
	<tony.luck@...el.com>, <rafael@...nel.org>, <lenb@...nel.org>,
	<mchehab@...nel.org>, <dave@...olabs.net>, <dave.jiang@...el.com>,
	<alison.schofield@...el.com>, <vishal.l.verma@...el.com>,
	<ira.weiny@...el.com>, <david@...hat.com>, <Vilas.Sridharan@....com>,
	<leo.duran@....com>, <Yazen.Ghannam@....com>, <rientjes@...gle.com>,
	<jiaqiyan@...gle.com>, <Jon.Grimm@....com>, <dave.hansen@...ux.intel.com>,
	<naoya.horiguchi@....com>, <james.morse@....com>, <jthoughton@...gle.com>,
	<somasundaram.a@....com>, <erdemaktas@...gle.com>, <pgonda@...gle.com>,
	<duenwen@...gle.com>, <gthelen@...gle.com>, <wschwartz@...erecomputing.com>,
	<dferguson@...erecomputing.com>, <wbs@...amperecomputing.com>,
	<nifan.cxl@...il.com>, <tanxiaofei@...wei.com>, <prime.zeng@...ilicon.com>,
	<roberto.sassu@...wei.com>, <kangkang.shen@...urewei.com>,
	<wanghuiqiang@...wei.com>, <linuxarm@...wei.com>
Subject: Re: [PATCH v18 15/19] cxl/memfeature: Add CXL memory device patrol
 scrub control feature

On Fri, 24 Jan 2025 12:38:43 -0800
Dan Williams <dan.j.williams@...el.com> wrote:

> shiju.jose@ wrote:
> > From: Shiju Jose <shiju.jose@...wei.com>
> > 
> > CXL spec 3.1 section 8.2.9.9.11.1 describes the device patrol scrub control
> > feature. The device patrol scrub proactively locates and makes corrections
> > to errors in regular cycle.
> > 
> > Allow specifying the number of hours within which the patrol scrub must be
> > completed, subject to minimum and maximum limits reported by the device.
> > Also allow disabling scrub allowing trade-off error rates against
> > performance.
> > 
> > Add support for patrol scrub control on CXL memory devices.
> > Register with the EDAC device driver, which retrieves the scrub attribute
> > descriptors from EDAC scrub and exposes the sysfs scrub control attributes
> > to userspace. For example, scrub control for the CXL memory device
> > "cxl_mem0" is exposed in /sys/bus/edac/devices/cxl_mem0/scrubX/.
> > 
> > Additionally, add support for region-based CXL memory patrol scrub control.
> > CXL memory regions may be interleaved across one or more CXL memory
> > devices. For example, region-based scrub control for "cxl_region1" is
> > exposed in /sys/bus/edac/devices/cxl_region1/scrubX/.
> > 
> > Reviewed-by: Dave Jiang <dave.jiang@...el.com>
> > Co-developed-by: Jonathan Cameron <Jonathan.Cameron@...wei.com>
> > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@...wei.com>
> > Signed-off-by: Shiju Jose <shiju.jose@...wei.com>
Hi Dan,

A few specific replies in line. I've left the detail stuff to Shiju
to address.  Definitely a few things in there I'd missed!

Thanks,

Jonathan


> > ---
> >  Documentation/edac/scrub.rst  |  66 ++++++
> >  drivers/cxl/Kconfig           |  17 ++
> >  drivers/cxl/core/Makefile     |   1 +
> >  drivers/cxl/core/memfeature.c | 392 ++++++++++++++++++++++++++++++++++
> >  drivers/cxl/core/region.c     |   6 +
> >  drivers/cxl/cxlmem.h          |   7 +
> >  drivers/cxl/mem.c             |   5 +
> >  include/cxl/features.h        |  16 ++
> >  8 files changed, 510 insertions(+)
> >  create mode 100644 drivers/cxl/core/memfeature.c
> > diff --git a/Documentation/edac/scrub.rst b/Documentation/edac/scrub.rst
> > index f86645c7f0af..80e986c57885 100644
> > --- a/Documentation/edac/scrub.rst
> > +++ b/Documentation/edac/scrub.rst
> > @@ -325,3 +325,69 @@ root@...alhost:~# cat /sys/bus/edac/devices/acpi_ras_mem0/scrub0/current_cycle_d
> >  10800
> >  
> >  root@...alhost:~# echo 0 > /sys/bus/edac/devices/acpi_ras_mem0/scrub0/enable_background
> > +
> > +2. CXL memory device patrol scrubber
> > +
> > +2.1 Device based scrubbing
> > +
> > +root@...alhost:~# cat /sys/bus/edac/devices/cxl_mem0/scrub0/min_cycle_duration
> > +
> > +3600
> > +
> > +root@...alhost:~# cat /sys/bus/edac/devices/cxl_mem0/scrub0/max_cycle_duration
> > +
> > +918000
> > +
> > +root@...alhost:~# cat /sys/bus/edac/devices/cxl_mem0/scrub0/current_cycle_duration
> > +
> > +43200
> > +
> > +root@...alhost:~# echo 54000 > /sys/bus/edac/devices/cxl_mem0/scrub0/current_cycle_duration
> > +
> > +root@...alhost:~# cat /sys/bus/edac/devices/cxl_mem0/scrub0/current_cycle_duration
> > +
> > +54000
> > +
> > +root@...alhost:~# echo 1 > /sys/bus/edac/devices/cxl_mem0/scrub0/enable_background
> > +
> > +root@...alhost:~# cat /sys/bus/edac/devices/cxl_mem0/scrub0/enable_background
> > +
> > +1
> > +
> > +root@...alhost:~# echo 0 > /sys/bus/edac/devices/cxl_mem0/scrub0/enable_background
> > +
> > +root@...alhost:~# cat /sys/bus/edac/devices/cxl_mem0/scrub0/enable_background
> > +
> > +0
> > +
> > +2.2. Region based scrubbing
> > +
> > +root@...alhost:~# cat /sys/bus/edac/devices/cxl_region0/scrub0/min_cycle_duration
> > +
> > +3600
> > +
> > +root@...alhost:~# cat /sys/bus/edac/devices/cxl_region0/scrub0/max_cycle_duration
> > +
> > +918000
> > +
> > +root@...alhost:~# cat /sys/bus/edac/devices/cxl_region0/scrub0/current_cycle_duration
> > +
> > +43200
> > +
> > +root@...alhost:~# echo 54000 > /sys/bus/edac/devices/cxl_region0/scrub0/current_cycle_duration
> > +
> > +root@...alhost:~# cat /sys/bus/edac/devices/cxl_region0/scrub0/current_cycle_duration
> > +
> > +54000
> > +
> > +root@...alhost:~# echo 1 > /sys/bus/edac/devices/cxl_region0/scrub0/enable_background
> > +
> > +root@...alhost:~# cat /sys/bus/edac/devices/cxl_region0/scrub0/enable_background
> > +
> > +1
> > +
> > +root@...alhost:~# echo 0 > /sys/bus/edac/devices/cxl_region0/scrub0/enable_background
> > +
> > +root@...alhost:~# cat /sys/bus/edac/devices/cxl_region0/scrub0/enable_background  
> 
> What is this content-free blob of cat and echo statements? Please write actual
> documentation with theory of operation, clarification of assumptions,
> rationale for defaults, guidance on changing defaults... 

Note this is a continuation of existing documentation, but sure some inline comments
talking more about it would be fine.  The rational and top level discussion is
meant to be described in patch 2 as it is not CXL specific.

Defaults are a device thing, there are no software driven defaults.

So I'd suggest the above just adds a few comments along the lines of
what the actions of each block does. 
Something like:

Check current parameters and program the scrubbing for a region to repeat
every X seconds (% of day)


> 
> > diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> > index 0bc6a2cb8474..6078f02e883b 100644
> > --- a/drivers/cxl/Kconfig
> > +++ b/drivers/cxl/Kconfig
> > @@ -154,4 +154,21 @@ config CXL_FEATURES
> >  
> >  	  If unsure say 'y'.
> >  
> > +config CXL_RAS_FEATURES
> > +	tristate "CXL: Memory RAS features"
> > +	depends on CXL_PCI  
> 
> What is the build dependency on CXL_PCI? This enabling does not call back into
> symbols provided by cxl_pci.ko does it?
> 
> > +	depends on CXL_MEM  
> 
> Similar comment, and this also goes away if all of this just moves into
> the new cxl_features driver.

I'm not sure moving to be a child of cxl_features makes sense. Can
probably do it but it's making the spiders web of connections even harder
to relate to the underlying hardware. In my mental model, this stuff
takes services from 'features' and 'mailbox' parts of the CXL driver.

Take repair.   Less than half of each of those drivers is feature related
(a few 'what can I do' type aspects). The control plane goes via
maintenance commands.

Obviously we can get to those by adding additional infrastructure to the
features driver, but that seems likely to be ugly and where do we stop?
It won't scale to likely future cases where the feature part is a tiny
tweak on some much larger chunk of infrastructure (which is mostly what
spec defined features seem to be for). I don't think we want to support
the complexity of device built-in test in the 3.2 spec necessarily
(haven't really thought about it yet!) but it is an example of what would
be an EDAC feature but has no dependence on features.

We could register the patrol scrub stuff from features, and the rest
separately but that seems even more confusing.

So to me, nesting under features is an ugly solution but I'm not that
attached to current approach.

So in my view this stuff should be dependent on CXL_FEATURES but
not a child of it.


(lots skipped - I'll leave the detailed stuff to Shiju!)
> > +	else
> > +		snprintf(cxl_dev_name, sizeof(cxl_dev_name),
> > +			 "%s_%s", "cxl", dev_name(&cxlmd->dev));  
> 
> Can a "cxl" directory be created so that the raw name can be used?

I'd like feedback from Boris on that.  It is a mess to instantiate
devices in subdirectories under a bus (that's kind of the big issue
with the EDAC usage of the device model already).

I'd say no it can't.

> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ