[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200923162510.GB1684790@yaz-nikka.amd.com>
Date: Wed, 23 Sep 2020 11:25:10 -0500
From: Yazen Ghannam <yazen.ghannam@....com>
To: Borislav Petkov <bp@...en8.de>
Cc: linux-edac@...r.kernel.org, linux-kernel@...r.kernel.org,
tony.luck@...el.com, x86@...nel.org,
Smita.KoralahalliChannabasappa@....com
Subject: Re: [PATCH v2 8/8] x86/MCE/AMD Support new memory interleaving modes
during address translation
On Wed, Sep 23, 2020 at 10:20:39AM +0200, Borislav Petkov wrote:
> On Thu, Sep 03, 2020 at 08:01:44PM +0000, Yazen Ghannam wrote:
> > From: Muralidhara M K <muralidhara.mk@....com>
> >
> > Add support for new memory interleaving modes used in current AMD systems.
> >
> > Check if the system is using a current Data Fabric version or a legacy
> > version as some bit and register definitions have changed.
> >
> > Tested on AMD reference platforms with the following memory interleaving
> > options.
> >
> > Naples
> > - None
> > - Channel
> > - Die
> > - Socket
> >
> > Rome (NPS = Nodes per Socket)
> > - None
> > - NPS0
> > - NPS1
> > - NPS2
> > - NPS4
> >
> > The fixes tag refers to the commit that allows amd64_edac_mod to load on
> > Rome systems.
>
> Err, why? This is adding new stuff to an address translation function.
> How does that fix amd64_edac loading on Rome?
>
> > The module may report an incorrect system addresses on
> > Rome systems depending on the interleaving option used.
>
> That doesn't stop it from loading, sorry.
>
Okay, no problem.
> Now, before you guys do any new features, I'd like you to split this
> humongous function umc_normaddr_to_sysaddr() logically into separate
> helpers and each helper does exactly one thing and one thing only.
>
> Then use a verb in its name: umc_translate_normaddr_to_sysaddr() or so.
>
Okay, will do.
> Also, Yazen, remind me again pls why isn't this function in
> drivers/edac/amd64_edac.c, where it is needed?
>
> If the reason is not valid anymore, let's move it there before splitting
> so that it doesn't bloat the core code.
>
I don't remember the original reason, and I was recently asked about
this code living in a module. I did some looking after this ask, and I
found that we should be using this translation to get a proper value for
the memory error notifiers to use. So I think we still need to use this
function some way with the core code even if the EDAC interface isn't
used.
I think this set can be split up.
1) Set with patches 1-3 fixed up to use cpu_die_id.
2) Set with the address translation updates.
a) Move umc_normaddr_to_sysaddr() into a new module under EDAC.
b) Hook the new module into amd64_edac.c where it's used today.
c) Refactor the code as you suggested above.
d) Add the new features.
3) New set that sets up a proper notifier for the address translation.
a) Unhook the new module from amd64_edac.c.
b) Register a notifer that runs before any notifiers that operate on
memory errors.
c) Find a way to pass the translated address through the chain
without losing the original value.
What do you think?
Thanks,
Yazen
Powered by blists - more mailing lists