[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240226102944.000070a3@Huawei.com>
Date: Mon, 26 Feb 2024 10:29:44 +0000
From: Jonathan Cameron <Jonathan.Cameron@...wei.com>
To: Dan Williams <dan.j.williams@...el.com>
CC: Shiju Jose <shiju.jose@...wei.com>, "linux-cxl@...r.kernel.org"
<linux-cxl@...r.kernel.org>, "linux-acpi@...r.kernel.org"
<linux-acpi@...r.kernel.org>, "linux-mm@...ck.org" <linux-mm@...ck.org>,
"dave@...olabs.net" <dave@...olabs.net>, "dave.jiang@...el.com"
<dave.jiang@...el.com>, "alison.schofield@...el.com"
<alison.schofield@...el.com>, "vishal.l.verma@...el.com"
<vishal.l.verma@...el.com>, "ira.weiny@...el.com" <ira.weiny@...el.com>,
"linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"david@...hat.com" <david@...hat.com>, "Vilas.Sridharan@....com"
<Vilas.Sridharan@....com>, "leo.duran@....com" <leo.duran@....com>,
"Yazen.Ghannam@....com" <Yazen.Ghannam@....com>, "rientjes@...gle.com"
<rientjes@...gle.com>, "jiaqiyan@...gle.com" <jiaqiyan@...gle.com>,
"tony.luck@...el.com" <tony.luck@...el.com>, "Jon.Grimm@....com"
<Jon.Grimm@....com>, "dave.hansen@...ux.intel.com"
<dave.hansen@...ux.intel.com>, "rafael@...nel.org" <rafael@...nel.org>,
"lenb@...nel.org" <lenb@...nel.org>, "naoya.horiguchi@....com"
<naoya.horiguchi@....com>, "james.morse@....com" <james.morse@....com>,
"jthoughton@...gle.com" <jthoughton@...gle.com>, "somasundaram.a@....com"
<somasundaram.a@....com>, "erdemaktas@...gle.com" <erdemaktas@...gle.com>,
"pgonda@...gle.com" <pgonda@...gle.com>, "duenwen@...gle.com"
<duenwen@...gle.com>, "mike.malvestuto@...el.com"
<mike.malvestuto@...el.com>, "gthelen@...gle.com" <gthelen@...gle.com>,
"wschwartz@...erecomputing.com" <wschwartz@...erecomputing.com>,
"dferguson@...erecomputing.com" <dferguson@...erecomputing.com>, tanxiaofei
<tanxiaofei@...wei.com>, "Zengtao (B)" <prime.zeng@...ilicon.com>,
"kangkang.shen@...urewei.com" <kangkang.shen@...urewei.com>, wanghuiqiang
<wanghuiqiang@...wei.com>, Linuxarm <linuxarm@...wei.com>
Subject: Re: [RFC PATCH v6 00/12] cxl: Add support for CXL feature commands,
CXL device patrol scrub control and DDR5 ECS control features
On Fri, 23 Feb 2024 11:42:24 -0800
Dan Williams <dan.j.williams@...el.com> wrote:
> Shiju Jose wrote:
> > Hi Dan,
> >
> > Thanks for the feedback.
> >
> > Please find reply inline.
> >
> > >-----Original Message-----
> > >From: Dan Williams <dan.j.williams@...el.com>
> > >Sent: 22 February 2024 00:21
> > >To: Shiju Jose <shiju.jose@...wei.com>; linux-cxl@...r.kernel.org; linux-
> > >acpi@...r.kernel.org; linux-mm@...ck.org; dan.j.williams@...el.com;
> > >dave@...olabs.net; Jonathan Cameron <jonathan.cameron@...wei.com>;
> > >dave.jiang@...el.com; alison.schofield@...el.com; vishal.l.verma@...elcom;
> > >ira.weiny@...el.com
> > >Cc: linux-edac@...r.kernel.org; linux-kernel@...r.kernel.org;
> > >david@...hat.com; Vilas.Sridharan@....com; leo.duran@....com;
> > >Yazen.Ghannam@....com; rientjes@...gle.com; jiaqiyan@...gle.com;
> > >tony.luck@...el.com; Jon.Grimm@....com; dave.hansen@...ux.intel.com;
> > >rafael@...nel.org; lenb@...nel.org; naoya.horiguchi@....com;
> > >james.morse@....com; jthoughton@...gle.com; somasundaram.a@....com;
> > >erdemaktas@...gle.com; pgonda@...gle.com; duenwen@...gle.com;
> > >mike.malvestuto@...el.com; gthelen@...gle.com;
> > >wschwartz@...erecomputing.com; dferguson@...erecomputing.com;
> > >tanxiaofei <tanxiaofei@...wei.com>; Zengtao (B) <prime.zeng@...ilicon.com>;
> > >kangkang.shen@...urewei.com; wanghuiqiang <wanghuiqiang@...wei.com>;
> > >Linuxarm <linuxarm@...wei.com>; Shiju Jose <shiju.jose@...wei.com>
> > >Subject: RE: [RFC PATCH v6 00/12] cxl: Add support for CXL feature commands,
> > >CXL device patrol scrub control and DDR5 ECS control features
> > >
> > >shiju.jose@ wrote:
> > >> From: Shiju Jose <shiju.jose@...wei.com>
> > >>
> > >> 1. Add support for CXL feature mailbox commands.
> > >> 2. Add CXL device scrub driver supporting patrol scrub control and ECS
> > >> control features.
> > >> 3. Add scrub subsystem driver supports configuring memory scrubs in the
> > >system.
> > >> 4. Register CXL device patrol scrub and ECS with scrub subsystem.
> > >> 5. Add common library for RASF and RAS2 PCC interfaces.
> > >> 6. Add driver for ACPI RAS2 feature table (RAS2).
> > >> 7. Add memory RAS2 driver and register with scrub subsystem.
> > >
> > >I stepped away from this patch set to focus on the changes that landed for v6.8
> > >and the follow-on regression fixups. Now that v6.8 CXL work has quieted down
> > >and I circle back to this set for v6.9 I find the lack of story in this cover letter to
> > >be unsettling. As a reviewer I should not have to put together the story on why
> > >Linux should care about this feature and independently build up the
> > >maintainence-burden vs benefit tradeoff analysis.
> > I will add more details to the cover letter.
> >
> > >
> > >Maybe it is self evident to others, but for me there is little in these changelogs
> > >besides "mechanism exists, enable it". There are plenty of platform or device
> > >mechanisms that get specified that Linux does not enable for one reason or
> > >another.
> > >
> > >The cover letter needs to answer why it matters, and what are the tradeoffs.
> > >Mind you, in my submissions I do not always get this right in the cover letter [1],
> > >but hopefully at least one of the patches tells the story [2].
> > >
> > >In other words, imagine you are writing the pull request to Linus or someone
> > >else with limited time who needs to make a risk decision on a pull request with a
> > >diffstat of:
> > >
> > > 23 files changed, 3083 insertions(+)
> > >
> > >...where the easiest decision is to just decline. As is, these changelogs are not
> > >close to tipping the scale to "accept".
> > >
> > >[sidebar: how did this manage to implement a new subsystem with 2 consumers
> > >(CXL + ACPI), without modifying a single existing line? Zero deletions? That is
> > >either an indication that Linux perfectly anticipated this future use case
> > >(unlikely), or more work needs to be done to digest an integrate these concepts
> > >into existing code paths]
> > >
> > >One of the first questions for me is why CXL and RAS2 as the first consumers and
> > >not NVDIMM-ARS and/or RASF Patrol Scrub? Part of the maintenance burden
> > We don't personally care about NVDIMMS but would welcome drivers from others.
>
> Upstream would also welcome consideration of maintenance burden
> reduction before piling on, at least include *some* consideration of the
> implications vs this response that comes off as "that's somebody else's
> problem".
We can do analysis of whether the interfaces are suitable etc but
have no access to test hardware or emulation. I guess I can hack something
together easily enough. Today ndctl has some support. Interestingly the model
is different from typical volatile scrubbing as it's all on demand - that
could be easily wrapped up in a software scrub scheduler though, but we'd need
input from you and other Intel people on how this is actually used.
The use model is a lot less obvious than autonomous scrubbers - I assume because
the persistence means you need to do this rarely if at all (though ARS does
support scrubbing volatile memory on nvdimms)
So initial conclusion is it would need a few more controls or it needs
some software handling of scan scheduling to map it to the interface type
that is common to CXL and RAS2 scrub controls.
Intent of the comment was to keep scope somewhat confined, and to
invite others to get involved, not to rule out doing some light weight
analysis of whether this feature would work for another potential user
which we weren't even aware of until you mentioned it (thanks!).
>
> > Regarding RASF patrol scrub no one cared about it as it's useless and
> > any new implementation should be RAS2.
>
> The assertion that "RASF patrol scrub no one cared about it as it's
> useless and any new implementation should be RAS2" needs evidence.
>
> For example, what platforms are going to ship with RAS2 support, what
> are the implications of Linux not having RAS2 scrub support in a month,
> or in year? There are parts of the ACPI spec that have never been
> implemented what is the evidence that RAS2 is not going to suffer the
> same fate as RASF?
From discussions with various firmware folk we have a chicken and egg
situation on RAS2. They will stick to their custom solutions unless there is
plausible support in Linux for it - so right now it's a question mark
on roadmaps. Trying to get rid of that question mark is why Shiju and I
started looking at this in the first place. To get rid of that question
mark we don't necessarily need to have it upstream, but we do need
to be able to make the argument that there will be a solution ready
soon after they release the BIOS image. (Some distros will take years
to catch up though).
If anyone else an speak up on this point please do. Discussions and
feedback communicated to Shiju and I off list aren't going to
convince people :(
Negatives perhaps easier to give than positives given this is seen as
a potential feature for future platforms so may be confidential.
> There are parts of the CXL specification that have
> never been implemented in mass market products.
Obviously can't talk about who was involved in this feature
in it's definition, but I have strong confidence it will get implemented
for reasons I can point at on a public list.
a) There will be scrubbing on devices.
b) It will need control (evidence for this is the BIOS controls mentioned below
for equivalent main memory).
c) Hotplug means that control must be done by OS driver (or via very fiddly
pre hotplug hacks that I think we can all agree should not be necessary
and aren't even an option on all platforms)
d) No one likes custom solutions.
This isn't a fancy feature with a high level of complexity which helps.
Today there is the option for main memory of leaving it to BIOS parameters.
A quick google gave me some examples (to make sure they are public):
Dell: PowerEdge R640 BIOS and UEFI Reference Guide
- Memory patrol scrub - Sets the memory patrol scrub frequency.
HP UEFI System Utilities for HPE ProLiant Gen 11 SErvers
- Enabling or disable patrol scrub
Spec list of flags for lenovo systems (tells you that turning patrol scrub
off is a good idea ;)
Huawei Kunpeng 920 RAS config menu.
- Active Scrub, Active Scrub interval etc.
>
> > Previous discussions in the community about RASF and scrub could be find here.
> > https://lore.kernel.org/lkml/20230915172818.761-1-shiju.jose@huawei.com/#r
> > and some old ones,
> > https://patchwork.kernel.org/project/linux-arm-kernel/patch/CS1PR84MB0038718F49DBC0FF03919E1184390@CS1PR84MB0038.NAMPRD84.PROD.OUTLOOK.COM/
> >
>
> Do not make people hunt for old discussions, if there are useful points
> in that discussion that make the case for the patch set include those in
> the next submission, don't make people hunt for the latest state of the
> story.
Sure, more of an essay needed along with links given we are talking
about the views of others.
Quick summary from a reread of the linked threads.
AMD not implemented RASF/RAS2 yet - looking at it last year, but worried
about inflexibility of RAS2 spec today. They were looking at some spec
changes to improve this + other functions to be added to RAS2.
I agree with it being limited, but think extending with backwards
compatibility isn't a problem (and ACPI spec rules in theory guarantee
it won't break). I'm keen on working with current version
so that we can ensure the ABI design for CXL encompasses it.
Intel folk were cc'd but not said anything on that thread, but Tony Luck
did comment in Jiaqi Yan's software scrubbing discussion linked below.
He observed that a hardware implementation can be complex if doing range
based scrubbing due to interleave etc. RAS2 and CXL both side step this
somewhat by making it someone elses problem. In RAS2 the firmware gets
to program multiple scrubbers to cover the range requested. In CXL
for now this leaves the problem for userspace, but we can definitely
consider a region interface if it makes sense.
I'd also like to see inputs from a wider range of systems folk + other
CPU companies. How easy this is to implement is heavily dependent on
what entity in your system is responsible for this sort of runtime
service and that varies a lot.
>
> > https://lore.kernel.org/all/20221103155029.2451105-1-jiaqiyan@google.com/
>
> Yes, now that is a useful changelog, thank you for highlighting it,
> please follow its example.
It's not a changelog as such but a RFC in text only form.
However indeed lots of good info in there.
Jonathan
Powered by blists - more mailing lists