lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250114153944.7b525a04@foz.lan>
Date: Tue, 14 Jan 2025 15:39:44 +0100
From: Mauro Carvalho Chehab <mchehab+huawei@...nel.org>
To: Jonathan Cameron <Jonathan.Cameron@...wei.com>
Cc: Borislav Petkov <bp@...en8.de>, Shiju Jose <shiju.jose@...wei.com>,
 "linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
 "linux-cxl@...r.kernel.org" <linux-cxl@...r.kernel.org>,
 "linux-acpi@...r.kernel.org" <linux-acpi@...r.kernel.org>,
 "linux-mm@...ck.org" <linux-mm@...ck.org>, "linux-kernel@...r.kernel.org"
 <linux-kernel@...r.kernel.org>, "tony.luck@...el.com"
 <tony.luck@...el.com>, "rafael@...nel.org" <rafael@...nel.org>,
 "lenb@...nel.org" <lenb@...nel.org>, "mchehab@...nel.org"
 <mchehab@...nel.org>, "dan.j.williams@...el.com"
 <dan.j.williams@...el.com>, "dave@...olabs.net" <dave@...olabs.net>,
 "dave.jiang@...el.com" <dave.jiang@...el.com>, "alison.schofield@...el.com"
 <alison.schofield@...el.com>, "vishal.l.verma@...el.com"
 <vishal.l.verma@...el.com>, "ira.weiny@...el.com" <ira.weiny@...el.com>,
 "david@...hat.com" <david@...hat.com>, "Vilas.Sridharan@....com"
 <Vilas.Sridharan@....com>, "leo.duran@....com" <leo.duran@....com>,
 "Yazen.Ghannam@....com" <Yazen.Ghannam@....com>, "rientjes@...gle.com"
 <rientjes@...gle.com>, "jiaqiyan@...gle.com" <jiaqiyan@...gle.com>,
 "Jon.Grimm@....com" <Jon.Grimm@....com>, "dave.hansen@...ux.intel.com"
 <dave.hansen@...ux.intel.com>, "naoya.horiguchi@....com"
 <naoya.horiguchi@....com>, "james.morse@....com" <james.morse@....com>,
 "jthoughton@...gle.com" <jthoughton@...gle.com>, "somasundaram.a@....com"
 <somasundaram.a@....com>, "erdemaktas@...gle.com" <erdemaktas@...gle.com>,
 "pgonda@...gle.com" <pgonda@...gle.com>, "duenwen@...gle.com"
 <duenwen@...gle.com>, "gthelen@...gle.com" <gthelen@...gle.com>,
 "wschwartz@...erecomputing.com" <wschwartz@...erecomputing.com>,
 "dferguson@...erecomputing.com" <dferguson@...erecomputing.com>,
 "wbs@...amperecomputing.com" <wbs@...amperecomputing.com>,
 "nifan.cxl@...il.com" <nifan.cxl@...il.com>, tanxiaofei
 <tanxiaofei@...wei.com>, "Zengtao (B)" <prime.zeng@...ilicon.com>, "Roberto
 Sassu" <roberto.sassu@...wei.com>, "kangkang.shen@...urewei.com"
 <kangkang.shen@...urewei.com>, wanghuiqiang <wanghuiqiang@...wei.com>,
 Linuxarm <linuxarm@...wei.com>
Subject: Re: [PATCH v18 04/19] EDAC: Add memory repair control feature

Em Tue, 14 Jan 2025 13:05:37 +0000
Jonathan Cameron <Jonathan.Cameron@...wei.com> escreveu:

> On Tue, 14 Jan 2025 13:38:31 +0100
> Mauro Carvalho Chehab <mchehab+huawei@...nel.org> wrote:
> 
> > Em Thu, 9 Jan 2025 14:24:33 +0000
> > Jonathan Cameron <Jonathan.Cameron@...wei.com> escreveu:
> >   
> > > On Thu, 9 Jan 2025 13:32:22 +0100
> > > Borislav Petkov <bp@...en8.de> wrote:
> > > 
> > > Hi Boris,
> > >     
> > > > On Thu, Jan 09, 2025 at 11:00:43AM +0000, Shiju Jose wrote:      
> > > > > The min_ and max_ attributes of the control attributes are added  for your
> > > > > feedback on V15 to expose supported ranges of these control attributes to the user, 
> > > > > in the following links.          
> > > > 
> > > > Sure, but you can make that differently:
> > > > 
> > > > cat /sys/bus/edac/devices/<dev-name>/mem_repairX/bank
> > > > [x:y]
> > > > 
> > > > which is the allowed range.      
> > > 
> > > To my thinking that would fail the test of being an intuitive interface.
> > > To issue a repair command requires that multiple attributes be configured
> > > before triggering the actual repair.
> > > 
> > > Think of it as setting the coordinates of the repair in a high dimensional
> > > space.
> > > 
> > > In the extreme case of fine grained repair (Cacheline), to identify the
> > > relevant subunit of memory (obtained from the error record that we are
> > > basing the decision to repair on) we need to specify all of:
> > > 
> > > Channel, sub-channel, rank,  bank group, row, column and nibble mask.
> > > For coarser granularity repair only specify a subset of these applies and
> > > only the relevant controls are exposed to userspace.
> > > 
> > > They are broken out as specific attributes to enable each to be set before
> > > triggering the action with a write to the repair attribute.
> > > 
> > > There are several possible alternatives:
> > > 
> > > Option 1
> > > 
> > > "A:B:C:D:E:F:G:H:I:J" opaque single write to trigger the repair where
> > > each number is providing one of those coordinates and where a readback
> > > let's us known what each number is.
> > > 
> > > That single attribute interface is very hard to extend in an intuitive way.
> > > 
> > > History tell us more levels will be introduced in the middle, not just
> > > at the finest granularity, making such an interface hard to extend in
> > > a backwards compatible way.
> > > 
> > > Another alternative of a key value list would make for a nasty sysfs
> > > interface.
> > > 
> > > Option 2 
> > > There are sysfs interfaces that use a selection type presentation.
> > > 
> > > Write: "C", Read: "A, B, [C], D" but that only works well for discrete sets
> > > of options and is a pain to parse if read back is necessary.    
> > 
> > Writing it as:
> > 
> > 	a b [c] d
> > 
> > or even:
> > 	a, b, [c], d
> > 
> > doesn't make it hard to be parse on userspace. Adding a comma makes
> > Kernel code a little bigger, as it needs an extra check at the loop
> > to check if the line is empty or not:
> > 
> > 	if (*tmp != '\0')
> > 		*tmp += snprintf(", ")
> > 
> > Btwm we have an implementation like that on kernelspace/userspace for
> > the RC API:
> > 
> > - Kernelspace:
> >   https://github.com/torvalds/linux/blob/master/drivers/media/rc/rc-main.c#L1125
> >   6 lines of code + a const table with names/values, if we use the same example
> >   for EDAC:
> > 
> > 	const char *name[] = { "foo", "bar" };
> > 
> > 	for (i = 0; i < ARRAY_SIZE(names); i++) {
> > 		if (enabled & names[i].type)
> > 			tmp += sprintf(tmp, "[%s] ", names[i].name);
> > 		else if (allowed & proto_names[i].type)
> > 			tmp += sprintf(tmp, "%s ", names[i].name);
> > 	}
> > 
> > 
> > - Userspace:
> >   https://git.linuxtv.org/v4l-utils.git/tree/utils/keytable/keytable.c#n197
> >   5 lines of code + a const table, if we use the same example
> >   for ras-daemon:
> > 
> > 		const char *name[] = { 
> > 			[EDAC_FOO] = "[foo]",
> > 			[EDAC_BAR] = "[bar]",
> > 		};
> > 
> > 		for (p = strtok(arg, " ,"); p; p = strtok(NULL, " ,"))
> > 			for (i = 0; i < ARRAY_SIZE(name); i++)
> > 				if (!strcasecmp(p, name[i])
> > 					return i;
> > 		return -1;
> > 
> > 	(strtok handles both space and commas at the above example)
> > 
> > IMO, this is a lot better, as the alternative would be to have separate
> > sysfs nodes to describe what values are valid for a given edac devnode.
> > 
> > See, userspace needs to know what values are valid for a given
> > device and support for it may vary depending on the Kernel and
> > device version. So, we need to have the information about what values
> > are valid stored on some sysfs devnode, to allow backward compatibility.  
> 
> These aren't selectors from a discrete list so the question is more
> whether a syntax of
> <min> value <max> 
> is intuitive or not.  I'm not aware of precedence for this one.

From my side, I prefer having 3 separate sysfs nodes, as this is a
very common practice. Doing it on a different way sounds an API violation,
but if someone insists on dropping min/max, this can be argued at
https://lore.kernel.org/linux-api/.

On a very quick search:

	$ ./scripts/get_abi.pl search "\bmin.*max"

I can't see any place using min and max at the same devnode.

	$ ./scripts/get_abi.pl search "\b(min|max)"|grep /sys/ |wc -l
	234

So, it sounds to me that merging those into a single devnode is an
API violation.

> 
> There was another branch of the thread where Boris mentioned this as an
> option. It isn't bad to deal with and an easy change to the code,
> but I have an open question on what choice we make for representing
> unknown min / max.  For separate files the absence of the file
> indicates we don't have any information.
> 
> 
> >   
> > > 
> > > So in conclusion, I think the proposed multiple sysfs attribute style
> > > with them reading back the most recent value written is the least bad
> > > solution to a complex control interface.
> > >     
> > > > 
> > > > echo ... 
> > > > 
> > > > then writes in the bank.
> > > >       
> > > > > ... so we would propose we do not add max_ and min_ for now and see how the
> > > > > use cases evolve.        
> > > > 
> > > > Yes, you should apply that same methodology to the rest of the new features
> > > > you're adding: only add functionality for the stuff that is actually being
> > > > used now. You can always extend it later.
> > > > 
> > > > Changing an already user-visible API is a whole different story and a lot lot
> > > > harder, even impossible.
> > > > 
> > > > So I'd suggest you prune the EDAC patches from all the hypothetical usage and
> > > > then send only what remains so that I can try to queue them.      
> > > 
> > > Sure. In this case the addition of min/max was perhaps a wrong response to
> > > your request for a way to those ranges rather than just rejecting a write
> > > of something out of range as earlier version did.
> > > 
> > > We can revisit in future if range discovery becomes necessary.  Personally
> > > I don't think it is given we are only taking these actions in response error
> > > records that give us precisely what to write and hence are always in range.    
> > 
> > For RO devnodes, there's no need for ranges, but those are likely needed for
> > RW, as otherwise userspace may try to write invalid requests and/or have
> > backward-compatibility issues.  
> 
> Given these parameters are only meaningfully written with values coming
> ultimately from error records, userspace should never consider writing
> something that is out of range except during testing.
> 
> I don't mind presenting the range where known (in CXL case it is not
> discoverable for most of them) but I wouldn't expect tooling to ever
> read it as known correct values to write come from the error records.
> Checking those values against provided limits seems an unnecessary step
> given an invalid parameter that slips through will be rejected by the
> hardware anyway.

I'm fine starting without min/max if there's no current usecase, provided
that:

1. when needed, we add min/max as separate devnodes;
2. there won't be any backward issues when min/max gets added.

Regards,
Mauro

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ