[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241211154637.GA1923270@yaz-khff2.amd.com>
Date: Wed, 11 Dec 2024 10:46:37 -0500
From: Yazen Ghannam <yazen.ghannam@....com>
To: Borislav Petkov <bp@...en8.de>
Cc: Avadhut Naik <avadhut.naik@....com>, linux-edac@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] EDAC/amd64: Fix possible module load failure on some
UMC usage combinations
On Wed, Dec 11, 2024 at 12:07:29PM +0100, Borislav Petkov wrote:
> On Tue, Dec 10, 2024 at 09:20:00PM +0000, Avadhut Naik wrote:
> > Starting Zen4, AMD SOCs have 12 Unified Memory Controllers (UMCs) per
> > socket.
> >
> > When the amd64_edac module is being loaded, these UMCs are traversed to
> > determine if they have SdpInit (SdpCtrl[31]) and EccEnabled (UmcCapHi[30])
> > bits set and create masks in umc_en_mask and ecc_en_mask respectively.
> >
> > However, the current data type of these variables is u8. As a result, if
> > only the last 4 UMCs (UMC8 - UMC11) of the system have been utilized,
> > umc_ecc_enabled() will return false. Consequently, the module may fail to
> > load on these systems.
> >
> > Fixes: e2be5955a886 ("EDAC/amd64: Add support for AMD Family 19h Models 10h-1Fh and A0h-AFh")
> > Signed-off-by: Avadhut Naik <avadhut.naik@....com>
> > Cc: stable@...r.kernel.org
> > ---
> > Changes in v2:
> > 1. Change data type of variables from u16 to int. (Boris)
> > 2. Modify commit message per feedback. (Boris)
> > 3. Add Fixes: and CC:stable tags. (Boris)
> > ---
> > drivers/edac/amd64_edac.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
> > index ddfbdb66b794..b1c034214a8d 100644
> > --- a/drivers/edac/amd64_edac.c
> > +++ b/drivers/edac/amd64_edac.c
> > @@ -3362,7 +3362,7 @@ static bool dct_ecc_enabled(struct amd64_pvt *pvt)
> >
> > static bool umc_ecc_enabled(struct amd64_pvt *pvt)
> > {
> > - u8 umc_en_mask = 0, ecc_en_mask = 0;
> > + int umc_en_mask = 0, ecc_en_mask = 0;
> > u16 nid = pvt->mc_node_id;
> > struct amd64_umc *umc;
> > u8 ecc_en = 0, i;
>
> Hmm, looking at that whole function, it looks kinda clumsy to me. If the point
> is to check whether at least one UMC is enabled, why aren't we doing simply
> that instead of those silly masks?
>
> Yazen? Did you think about checking anything else here, in addition?
>
I think we used the masks because we would only read registers as
needed.
196b79fcc8ed ("EDAC, amd64: Extend ecc_enabled() to Fam17h")
Now we cache all the registers at init time. So yeah, I agree that this
can be simplified.
> Because if not, this can be written as simple as:
>
> static bool umc_ecc_enabled(struct amd64_pvt *pvt)
> {
> u16 nid = pvt->mc_node_id;
> struct amd64_umc *umc;
> bool ecc_en = false;
> int i;
>
> /* Check whether at least one UMC is enabled: */
> for_each_umc(i) {
> umc = &pvt->umc[i];
>
> if (umc->sdp_ctrl & UMC_SDP_INIT &&
> umc->umc_cap_hi & UMC_ECC_ENABLED) {
> ecc_en = true;
> break;
> }
> }
>
> edac_dbg(3, "Node %d: DRAM ECC %s.\n", nid, (ecc_en ? "enabled" : "disabled"));
>
> return ecc_en;
> }
>
Looks good overall. We can even remove the "nid" variable and just use
"pvt->mc_node_id" directly in the debug message. This is another remnant
from when this function did register accesses.
Thanks,
Yazen
Powered by blists - more mailing lists