[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <d8c5bc59-5516-4a22-8c74-a266cbb9c59d@amd.com>
Date: Wed, 28 May 2025 14:38:43 -0500
From: "Naik, Avadhut" <avadnaik@....com>
To: Borislav Petkov <bp@...en8.de>
Cc: linux-edac@...r.kernel.org, linux-kernel@...r.kernel.org,
Žilvinas Žaltiena <zilvinas@...rix.lt>,
Yazen Ghannam <yazen.ghannam@....com>, Avadhut Naik <avadhut.naik@....com>
Subject: Re: [PATCH v4] EDAC/amd64: Fix size calculation for Non-Power-of-Two
DIMMs
Hi,
On 5/28/2025 04:22, Borislav Petkov wrote:
> On Tue, May 13, 2025 at 07:20:11PM +0000, Avadhut Naik wrote:
>> Each Chip-Select (CS) of a Unified Memory Controller (UMC) on AMD's
>> modern Zen-based SOCs has an Address Mask and a Secondary Address Mask
>> register associated with it. The amd64_edac module logs DIMM sizes on a
>> per-UMC per-CS granularity during init using these two registers.
>>
>> Currently, the module primarily considers only the Address Mask register
>> for computing DIMM sizes. The Secondary Address Mask register is only
>> considered for odd CS. Additionally, if it has been considered, the
>> Address Mask register is ignored altogether for that CS. For
>> power-of-two DIMMs, this is not an issue since only the Address Mask
>
> What are power-of-two DIMMs?
>
> The number of DIMMs on the system is a 2^x?
>
> Their ranks are a power of two?
>
> Their combined size is not power of two?
>
> One can only guess...
>
By power-of-two DIMMs, I mean the DIMMs whose combined i.e .total size
is a power of two. Example: 16 GB, 32 GB or 64 GB DIMMs.
Will mention that explicitly in the commit message.
>> register is used.
>>
>> For non-power-of-two DIMMs, however, the Secondary Address Mask register
>> is used in conjunction with the Address Mask register. However, since the
>> module only considers either of the two registers for a CS, the size
>> computed by the module is incorrect.
>
> Yah, it must be something about the size...
>
>> The Secondary Address Mask register
>> is not considered for even CS, and the Address Mask register is not
>> considered for odd CS.
>>
>> Introduce a new helper function so that both Address Mask and Secondary
>> Address Mask registers are considered, when valid, for computing DIMM
>> sizes. Furthermore, also rename some variables for greater clarity.
>
> So it is non-power-of-two sized DIMMs?
>
> IOW, DIMMs whose size is not a power of two?
>
Yes, non-power-of-2 DIMMs are those DIMMs whose combined i.e. total size
is not a power of two. Example: 24 GB, 48 GB or 96 GB DIMMs.
>> Fixes: 81f5090db843 ("EDAC/amd64: Support asymmetric dual-rank DIMMs")
>> Reported-by: Žilvinas Žaltiena <zilvinas@...rix.lt>
>> Closes: https://lore.kernel.org/dbec22b6-00f2-498b-b70d-ab6f8a5ec87e@natrix.lt
>> Signed-off-by: Avadhut Naik <avadhut.naik@....com>
>> Tested-by: Žilvinas Žaltiena <zilvinas@...rix.lt>
>> Reviewed-by: Yazen Ghannam <yazen.ghannam@....com>
>> Cc: stable@...r.kernel.org
>
> All that changelog stuff...
>
>> ```
>> Changes in v2:
>> 1. Avoid unnecessary variable initialization.
>> 2. Modify commit message to accurately reflect the changes.
>> 3. Move check for non-zero Address Mask register into the new helper.
>>
>> Changes in v3:
>> 1. Add the missing Closes tag and rearrange tags per tip tree handbook.
>> 3. Slightly modify commit message to properly reflect the SOCs that may
>> encounter this issue.
>> 4. Rebase on top of edac-for-next.
>>
>> Changes in v4:
>> 1. Rebase on top of edac-for-next.
>>
>> Links:
>> v1: https://lore.kernel.org/all/20250327210718.1640762-1-avadhut.naik@amd.com/
>> v2: https://lore.kernel.org/all/20250415213150.755255-1-avadhut.naik@amd.com/
>> v3: https://lore.kernel.org/all/20250416222552.1686475-1-avadhut.naik@amd.com/
>> ---
>
> <--- ... goes here, under the --- line so that patch handling tools can ignore
> it.
>
This is an OOPS!
Thanks for catching this! Will fix it.
>> drivers/edac/amd64_edac.c | 57 ++++++++++++++++++++++++---------------
>> 1 file changed, 36 insertions(+), 21 deletions(-)
>
> ...
>
>> +static int __addr_mask_to_cs_size(u32 addr_mask, u32 addr_mask_sec,
>> + unsigned int cs_mode, int csrow_nr, int dimm)
>> +{
>> + int size;
>>
>> edac_dbg(1, "CS%d DIMM%d AddrMasks:\n", csrow_nr, dimm);
>> - edac_dbg(1, " Original AddrMask: 0x%x\n", addr_mask_orig);
>> - edac_dbg(1, " Deinterleaved AddrMask: 0x%x\n", addr_mask_deinterleaved);
>> + edac_dbg(1, " Primary AddrMask: 0x%x\n", addr_mask);
>>
>> /* Register [31:1] = Address [39:9]. Size is in kBs here. */
>> - size = (addr_mask_deinterleaved >> 2) + 1;
>> + size = calculate_cs_size(addr_mask, cs_mode);
>> +
>> + edac_dbg(1, " Secondary AddrMask: 0x%x\n", addr_mask_sec);
>> + size += calculate_cs_size(addr_mask_sec, cs_mode);
>>
>> /* Return size in MBs. */
>> return size >> 10;
>> @@ -1270,7 +1284,7 @@ static int umc_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
>> unsigned int cs_mode, int csrow_nr)
>> {
>> int cs_mask_nr = csrow_nr;
>> - u32 addr_mask_orig;
>> + u32 addr_mask = 0, addr_mask_sec = 0;
>> int dimm, size = 0;
>
> The EDAC tree preferred ordering of variable declarations at the
> beginning of a function is reverse fir tree order::
>
> struct long_struct_name *descriptive_name;
> unsigned long foo, bar;
> unsigned int tmp;
> int ret;
>
> The above is faster to parse than the reverse ordering::
>
> int ret;
> unsigned int tmp;
> unsigned long foo, bar;
> struct long_struct_name *descriptive_name;
>
> And even more so than random ordering::
>
> unsigned long foo, bar;
> int ret;
> struct long_struct_name *descriptive_name;
> unsigned int tmp;
>
Will change them to reverse fir tree order.
Thank you for the feedback!
--
Thanks,
Avadhut Naik
Powered by blists - more mailing lists