[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241211110729.GAZ1lycaGYmjgNDGv9@fat_crate.local>
Date: Wed, 11 Dec 2024 12:07:29 +0100
From: Borislav Petkov <bp@...en8.de>
To: Avadhut Naik <avadhut.naik@....com>, yazen.ghannam@....com
Cc: linux-edac@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] EDAC/amd64: Fix possible module load failure on some
UMC usage combinations
On Tue, Dec 10, 2024 at 09:20:00PM +0000, Avadhut Naik wrote:
> Starting Zen4, AMD SOCs have 12 Unified Memory Controllers (UMCs) per
> socket.
>
> When the amd64_edac module is being loaded, these UMCs are traversed to
> determine if they have SdpInit (SdpCtrl[31]) and EccEnabled (UmcCapHi[30])
> bits set and create masks in umc_en_mask and ecc_en_mask respectively.
>
> However, the current data type of these variables is u8. As a result, if
> only the last 4 UMCs (UMC8 - UMC11) of the system have been utilized,
> umc_ecc_enabled() will return false. Consequently, the module may fail to
> load on these systems.
>
> Fixes: e2be5955a886 ("EDAC/amd64: Add support for AMD Family 19h Models 10h-1Fh and A0h-AFh")
> Signed-off-by: Avadhut Naik <avadhut.naik@....com>
> Cc: stable@...r.kernel.org
> ---
> Changes in v2:
> 1. Change data type of variables from u16 to int. (Boris)
> 2. Modify commit message per feedback. (Boris)
> 3. Add Fixes: and CC:stable tags. (Boris)
> ---
> drivers/edac/amd64_edac.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
> index ddfbdb66b794..b1c034214a8d 100644
> --- a/drivers/edac/amd64_edac.c
> +++ b/drivers/edac/amd64_edac.c
> @@ -3362,7 +3362,7 @@ static bool dct_ecc_enabled(struct amd64_pvt *pvt)
>
> static bool umc_ecc_enabled(struct amd64_pvt *pvt)
> {
> - u8 umc_en_mask = 0, ecc_en_mask = 0;
> + int umc_en_mask = 0, ecc_en_mask = 0;
> u16 nid = pvt->mc_node_id;
> struct amd64_umc *umc;
> u8 ecc_en = 0, i;
Hmm, looking at that whole function, it looks kinda clumsy to me. If the point
is to check whether at least one UMC is enabled, why aren't we doing simply
that instead of those silly masks?
Yazen? Did you think about checking anything else here, in addition?
Because if not, this can be written as simple as:
static bool umc_ecc_enabled(struct amd64_pvt *pvt)
{
u16 nid = pvt->mc_node_id;
struct amd64_umc *umc;
bool ecc_en = false;
int i;
/* Check whether at least one UMC is enabled: */
for_each_umc(i) {
umc = &pvt->umc[i];
if (umc->sdp_ctrl & UMC_SDP_INIT &&
umc->umc_cap_hi & UMC_ECC_ENABLED) {
ecc_en = true;
break;
}
}
edac_dbg(3, "Node %d: DRAM ECC %s.\n", nid, (ecc_en ? "enabled" : "disabled"));
return ecc_en;
}
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
Powered by blists - more mailing lists