lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3b2d4275d1d24dbeacee0f192ac4d69b@huawei.com>
Date: Thu, 9 Jan 2025 11:00:43 +0000
From: Shiju Jose <shiju.jose@...wei.com>
To: Borislav Petkov <bp@...en8.de>
CC: "linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
	"linux-cxl@...r.kernel.org" <linux-cxl@...r.kernel.org>,
	"linux-acpi@...r.kernel.org" <linux-acpi@...r.kernel.org>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "tony.luck@...el.com" <tony.luck@...el.com>,
	"rafael@...nel.org" <rafael@...nel.org>, "lenb@...nel.org" <lenb@...nel.org>,
	"mchehab@...nel.org" <mchehab@...nel.org>, "dan.j.williams@...el.com"
	<dan.j.williams@...el.com>, "dave@...olabs.net" <dave@...olabs.net>,
	"Jonathan Cameron" <jonathan.cameron@...wei.com>, "dave.jiang@...el.com"
	<dave.jiang@...el.com>, "alison.schofield@...el.com"
	<alison.schofield@...el.com>, "vishal.l.verma@...el.com"
	<vishal.l.verma@...el.com>, "ira.weiny@...el.com" <ira.weiny@...el.com>,
	"david@...hat.com" <david@...hat.com>, "Vilas.Sridharan@....com"
	<Vilas.Sridharan@....com>, "leo.duran@....com" <leo.duran@....com>,
	"Yazen.Ghannam@....com" <Yazen.Ghannam@....com>, "rientjes@...gle.com"
	<rientjes@...gle.com>, "jiaqiyan@...gle.com" <jiaqiyan@...gle.com>,
	"Jon.Grimm@....com" <Jon.Grimm@....com>, "dave.hansen@...ux.intel.com"
	<dave.hansen@...ux.intel.com>, "naoya.horiguchi@....com"
	<naoya.horiguchi@....com>, "james.morse@....com" <james.morse@....com>,
	"jthoughton@...gle.com" <jthoughton@...gle.com>, "somasundaram.a@....com"
	<somasundaram.a@....com>, "erdemaktas@...gle.com" <erdemaktas@...gle.com>,
	"pgonda@...gle.com" <pgonda@...gle.com>, "duenwen@...gle.com"
	<duenwen@...gle.com>, "gthelen@...gle.com" <gthelen@...gle.com>,
	"wschwartz@...erecomputing.com" <wschwartz@...erecomputing.com>,
	"dferguson@...erecomputing.com" <dferguson@...erecomputing.com>,
	"wbs@...amperecomputing.com" <wbs@...amperecomputing.com>,
	"nifan.cxl@...il.com" <nifan.cxl@...il.com>, tanxiaofei
	<tanxiaofei@...wei.com>, "Zengtao (B)" <prime.zeng@...ilicon.com>, "Roberto
 Sassu" <roberto.sassu@...wei.com>, "kangkang.shen@...urewei.com"
	<kangkang.shen@...urewei.com>, wanghuiqiang <wanghuiqiang@...wei.com>,
	Linuxarm <linuxarm@...wei.com>
Subject: RE: [PATCH v18 04/19] EDAC: Add memory repair control feature

>-----Original Message-----
>From: Borislav Petkov <bp@...en8.de>
>Sent: 09 January 2025 09:19
>To: Shiju Jose <shiju.jose@...wei.com>
>Cc: linux-edac@...r.kernel.org; linux-cxl@...r.kernel.org; linux-
>acpi@...r.kernel.org; linux-mm@...ck.org; linux-kernel@...r.kernel.org;
>tony.luck@...el.com; rafael@...nel.org; lenb@...nel.org;
>mchehab@...nel.org; dan.j.williams@...el.com; dave@...olabs.net; Jonathan
>Cameron <jonathan.cameron@...wei.com>; dave.jiang@...el.com;
>alison.schofield@...el.com; vishal.l.verma@...el.com; ira.weiny@...el.com;
>david@...hat.com; Vilas.Sridharan@....com; leo.duran@....com;
>Yazen.Ghannam@....com; rientjes@...gle.com; jiaqiyan@...gle.com;
>Jon.Grimm@....com; dave.hansen@...ux.intel.com;
>naoya.horiguchi@....com; james.morse@....com; jthoughton@...gle.com;
>somasundaram.a@....com; erdemaktas@...gle.com; pgonda@...gle.com;
>duenwen@...gle.com; gthelen@...gle.com;
>wschwartz@...erecomputing.com; dferguson@...erecomputing.com;
>wbs@...amperecomputing.com; nifan.cxl@...il.com; tanxiaofei
><tanxiaofei@...wei.com>; Zengtao (B) <prime.zeng@...ilicon.com>; Roberto
>Sassu <roberto.sassu@...wei.com>; kangkang.shen@...urewei.com;
>wanghuiqiang <wanghuiqiang@...wei.com>; Linuxarm
><linuxarm@...wei.com>
>Subject: Re: [PATCH v18 04/19] EDAC: Add memory repair control feature
>
>On Mon, Jan 06, 2025 at 12:10:00PM +0000, shiju.jose@...wei.com wrote:
>> +What:		/sys/bus/edac/devices/<dev-
>name>/mem_repairX/min_hpa
>> +What:		/sys/bus/edac/devices/<dev-
>name>/mem_repairX/min_dpa
>> +What:		/sys/bus/edac/devices/<dev-
>name>/mem_repairX/min_nibble_mask
>> +What:		/sys/bus/edac/devices/<dev-
>name>/mem_repairX/min_bank_group
>> +What:		/sys/bus/edac/devices/<dev-
>name>/mem_repairX/min_bank
>> +What:		/sys/bus/edac/devices/<dev-
>name>/mem_repairX/min_rank
>> +What:		/sys/bus/edac/devices/<dev-
>name>/mem_repairX/min_row
>> +What:		/sys/bus/edac/devices/<dev-
>name>/mem_repairX/min_column
>> +What:		/sys/bus/edac/devices/<dev-
>name>/mem_repairX/min_channel
>> +What:		/sys/bus/edac/devices/<dev-
>name>/mem_repairX/min_sub_channel
>> +What:		/sys/bus/edac/devices/<dev-
>name>/mem_repairX/max_hpa
>> +What:		/sys/bus/edac/devices/<dev-
>name>/mem_repairX/max_dpa
>> +What:		/sys/bus/edac/devices/<dev-
>name>/mem_repairX/max_nibble_mask
>> +What:		/sys/bus/edac/devices/<dev-
>name>/mem_repairX/max_bank_group
>> +What:		/sys/bus/edac/devices/<dev-
>name>/mem_repairX/max_bank
>> +What:		/sys/bus/edac/devices/<dev-
>name>/mem_repairX/max_rank
>> +What:		/sys/bus/edac/devices/<dev-
>name>/mem_repairX/max_row
>> +What:		/sys/bus/edac/devices/<dev-
>name>/mem_repairX/max_column
>> +What:		/sys/bus/edac/devices/<dev-
>name>/mem_repairX/max_channel
>> +What:		/sys/bus/edac/devices/<dev-
>name>/mem_repairX/max_sub_channel
>
>So this is new. I don't remember seeing that when I looked at your patches the
>last time.
>
>Looks like you have all those attributes and now you've decided to add a min and
>max for each one, in addition. And UI-wise it is a madness as there are gazillion
>single-value files now.
>

Thanks for the feedbacks.

The min_ and max_ attributes of the control attributes are added  for your
feedback on V15 to expose supported ranges of these control attributes to the user, 
in the following links.   
However these min_ and max_ attributes are 'RO' instead of 'RW' as specified in the doc, 
which to be fixed in the doc.
https://lore.kernel.org/lkml/20241114133249.GEZzX8ATNyc_Xw1L52@fat_crate.local/
https://lore.kernel.org/lkml/fa5d6bdd08104cf1a09c4960a0f9bc46@huawei.com/
https://lore.kernel.org/lkml/20241119123657.GCZzyGaZIExvUHPLKL@fat_crate.local/

>"Attributes should be ASCII text files, preferably with only one value per file. It is
>noted that it may not be efficient to contain only one value per file, so it is
>socially acceptable to express an array of values of the same type."
>
>So you don't need those - you can simply express each attribute as a range:
>
>echo "1:2" > /sys/bus/edac/devices/<dev-name>/mem_repairX/bank
>
>or if you wanna scrub only one bank:

After internal discussion, we think this is the source of the confusion. 
This is not scrub where a range would indeed make sense. It is repair. 
We are not aware of a failure mechanism where a set of memory banks
would fail together but not the whole of the next level up in the memory topology. 

In theory we might get a stupid device design where it reports coarse level
errors but can only repair at fine levels where a range might be appropriate.
We are not sure that makes sense in practice and with a range interface we will
get mess like running out of repair resources half way through a list with
no visibility of what is repaired.

However, given the repair flow is driven by userspace receiving error records
that will only possible values to repair, we think these bounds on what can be
repaired are a nice to have rather than necessary so we would propose we do not
add max_ and min_ for now and see how the use cases evolve.
>
>echo "1:1" > /sys/bus/edac/devices/<dev-name>/mem_repairX/bank
>
>What is the use case of that thing?
>
>Someone might find it useful so let's add it preemptively?
>
>Pfff.
>
>--
>Regards/Gruss,
>    Boris.
>
>https://people.kernel.org/tglx/notes-about-netiquette

Thanks,
Shiju

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ