lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <8e7e84c2-e035-49f2-b277-7639919213b7@amd.com>
Date: Wed, 16 Apr 2025 12:26:53 -0500
From: "Naik, Avadhut" <avadnaik@....com>
To: Yazen Ghannam <yazen.ghannam@....com>
Cc: linux-edac@...r.kernel.org, bp@...en8.de, linux-kernel@...r.kernel.org,
 Žilvinas Žaltiena <zilvinas@...rix.lt>,
 Avadhut Naik <avadhut.naik@....com>
Subject: [PATCH v2] EDAC/amd64: Fix size calculation for Non-Power-of-Two
 DIMMs



On 4/16/2025 09:10, Yazen Ghannam wrote:
> On Tue, Apr 15, 2025 at 09:25:58PM +0000, Avadhut Naik wrote:
>> Each Chip-Select (CS) of a Unified Memory Controller (UMC) on AMD EPYC
> 
> This affects Zen-based systems in general, not just the EPYC line.
> 
Will change this to something like:

Each Chip-Select (CS) of a Unified Memory Controller (UMC) on AMD's
modern Zen-based SOCs has an Address Mask and a Secondary Address Mask
register associated with it.

>> SOCs has an Address Mask and a Secondary Address Mask register associated
>> with it. The amd64_edac module logs DIMM sizes on a per-UMC per-CS
>> granularity during init using these two registers.
>>
>> Currently, the module primarily considers only the Address Mask register
>> for computing DIMM sizes. The Secondary Address Mask register is only
>> considered for odd CS. Additionally, if it has been considered, the
>> Address Mask register is ignored altogether for that CS. For
>> power-of-two DIMMs, this is not an issue since only the Address Mask
>> register is used.
>>
>> For non-power-of-two DIMMs, however, the Secondary Address Mask register
>> is used in conjunction with the Address Mask register. However, since the
>> module only considers either of the two registers for a CS, the size
>> computed by the module is incorrect. The Secondary Address Mask register
>> is not considered for even CS, and the Address Mask register is not
>> considered for odd CS.
>>
>> Introduce a new helper function so that both Address Mask and Secondary
>> Address Mask registers are considered, when valid, for computing DIMM
>> sizes. Furthermore, also rename some variables for greater clarity.
>>
>> Fixes: 81f5090db843 ("EDAC/amd64: Support asymmetric dual-rank DIMMs")
>> Reported-by: Žilvinas Žaltiena <zilvinas@...rix.lt>
> 
> Checkpatch says this should also have a 'Closes' tag. I never thought
> about this before, but it is mentioned in 'Documentation'. 
> 
> Closes: https://lore.kernel.org/dbec22b6-00f2-498b-b70d-ab6f8a5ec87e@natrix.lt
> 
Will add this.

>> Tested-by: Žilvinas Žaltiena <zilvinas@...rix.lt>
> 
> Minor nit: TIP tree handbook say 'Tested-by' goes after your SoB. I
> should check my submissions too.
> 
Had noticed that in some recent commits Tested-by immediately follows
Reported-by. And, SoB follows Tested-by.
Nonetheless, thanks for pointing this out. Will follow the tip handbook.

>> Signed-off-by: Avadhut Naik <avadhut.naik@....com>
>> Cc: stable@...r.kernel.org
>> ```
>> Changes in v2:
>> 1. Avoid unnecessary variable initialization.
>> 2. Modify commit message to accurately reflect the changes.
>> 3. Move check for non-zero Address Mask register into the new helper.
>>
>> Links:
>> v1: https://lore.kernel.org/linux-edac/9a33b6ff-9ce8-4abd-8629-3c9f6a546514@amd.com/T/#mc0b1101055f12ccb06e5a251d16f186597ed4133
> 
> Use a 'permalink' rather than this threaded link. Search for
> 'permalink' on the web page.
> 
> Better option is to use this format:
>   https://lore.kernel.org/<Message-ID>
> 
> As shown in 'Documentation/process/submitting-patches.rst'
> 
> Ex:
>   https://lore.kernel.org/20250327210718.1640762-1-avadhut.naik@amd.com
> 
> Side note: I've been adding 'r/' between lore and Message-ID out of
> habit. But it seems that this is no longer needed.
> 
Will use this henceforth. Thanks!

>> ---
>>  drivers/edac/amd64_edac.c | 57 ++++++++++++++++++++++++---------------
>>  1 file changed, 36 insertions(+), 21 deletions(-)
>>
>> diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
>> index 58b1482a0fbb..91d22e63bdb1 100644
>> --- a/drivers/edac/amd64_edac.c
>> +++ b/drivers/edac/amd64_edac.c
>> @@ -1209,7 +1209,9 @@ static int umc_get_cs_mode(int dimm, u8 ctrl, struct amd64_pvt *pvt)
>>  	if (csrow_enabled(2 * dimm + 1, ctrl, pvt))
>>  		cs_mode |= CS_ODD_PRIMARY;
>>  
>> -	/* Asymmetric dual-rank DIMM support. */
>> +	if (csrow_sec_enabled(2 * dimm, ctrl, pvt))
>> +		cs_mode |= CS_EVEN_SECONDARY;
>> +
>>  	if (csrow_sec_enabled(2 * dimm + 1, ctrl, pvt))
>>  		cs_mode |= CS_ODD_SECONDARY;
>>  
>> @@ -1230,12 +1232,13 @@ static int umc_get_cs_mode(int dimm, u8 ctrl, struct amd64_pvt *pvt)
>>  	return cs_mode;
>>  }
>>  
>> -static int __addr_mask_to_cs_size(u32 addr_mask_orig, unsigned int cs_mode,
>> -				  int csrow_nr, int dimm)
>> +static int calculate_cs_size(u32 mask, unsigned int cs_mode)
>>  {
>> -	u32 msb, weight, num_zero_bits;
>> -	u32 addr_mask_deinterleaved;
>> -	int size = 0;
>> +	int msb, weight, num_zero_bits;
>> +	u32 deinterleaved_mask;
>> +
>> +	if (!mask)
>> +		return 0;
>>  
>>  	/*
>>  	 * The number of zero bits in the mask is equal to the number of bits
>> @@ -1248,19 +1251,30 @@ static int __addr_mask_to_cs_size(u32 addr_mask_orig, unsigned int cs_mode,
>>  	 * without swapping with the most significant bit. This can be handled
>>  	 * by keeping the MSB where it is and ignoring the single zero bit.
>>  	 */
>> -	msb = fls(addr_mask_orig) - 1;
>> -	weight = hweight_long(addr_mask_orig);
>> +	msb = fls(mask) - 1;
>> +	weight = hweight_long(mask);
>>  	num_zero_bits = msb - weight - !!(cs_mode & CS_3R_INTERLEAVE);
>>  
>>  	/* Take the number of zero bits off from the top of the mask. */
>> -	addr_mask_deinterleaved = GENMASK_ULL(msb - num_zero_bits, 1);
>> +	deinterleaved_mask = GENMASK(msb - num_zero_bits, 1);
>> +	edac_dbg(1, "  Deinterleaved AddrMask: 0x%x\n", deinterleaved_mask);
>> +
>> +	return (deinterleaved_mask >> 2) + 1;
>> +}
>> +
>> +static int __addr_mask_to_cs_size(u32 addr_mask, u32 addr_mask_sec,
>> +				  unsigned int cs_mode, int csrow_nr, int dimm)
>> +{
>> +	int size;
>>  
>>  	edac_dbg(1, "CS%d DIMM%d AddrMasks:\n", csrow_nr, dimm);
>> -	edac_dbg(1, "  Original AddrMask: 0x%x\n", addr_mask_orig);
>> -	edac_dbg(1, "  Deinterleaved AddrMask: 0x%x\n", addr_mask_deinterleaved);
>> +	edac_dbg(1, "  Primary AddrMask: 0x%x\n", addr_mask);
>>  
>>  	/* Register [31:1] = Address [39:9]. Size is in kBs here. */
>> -	size = (addr_mask_deinterleaved >> 2) + 1;
>> +	size = calculate_cs_size(addr_mask, cs_mode);
>> +
>> +	edac_dbg(1, "  Secondary AddrMask: 0x%x\n", addr_mask_sec);
>> +	size += calculate_cs_size(addr_mask_sec, cs_mode);
>>  
>>  	/* Return size in MBs. */
>>  	return size >> 10;
>> @@ -1270,7 +1284,7 @@ static int umc_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
>>  				    unsigned int cs_mode, int csrow_nr)
>>  {
>>  	int cs_mask_nr = csrow_nr;
>> -	u32 addr_mask_orig;
>> +	u32 addr_mask = 0, addr_mask_sec = 0;
>>  	int dimm, size = 0;
>>  
>>  	/* No Chip Selects are enabled. */
>> @@ -1308,13 +1322,13 @@ static int umc_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
>>  	if (!pvt->flags.zn_regs_v2)
>>  		cs_mask_nr >>= 1;
>>  
>> -	/* Asymmetric dual-rank DIMM support. */
>> -	if ((csrow_nr & 1) && (cs_mode & CS_ODD_SECONDARY))
>> -		addr_mask_orig = pvt->csels[umc].csmasks_sec[cs_mask_nr];
>> -	else
>> -		addr_mask_orig = pvt->csels[umc].csmasks[cs_mask_nr];
>> +	if (cs_mode & (CS_EVEN_PRIMARY | CS_ODD_PRIMARY))
>> +		addr_mask = pvt->csels[umc].csmasks[cs_mask_nr];
>> +
>> +	if (cs_mode & (CS_EVEN_SECONDARY | CS_ODD_SECONDARY))
>> +		addr_mask_sec = pvt->csels[umc].csmasks_sec[cs_mask_nr];
>>  
>> -	return __addr_mask_to_cs_size(addr_mask_orig, cs_mode, csrow_nr, dimm);
>> +	return __addr_mask_to_cs_size(addr_mask, addr_mask_sec, cs_mode, csrow_nr, dimm);
>>  }
>>  
>>  static void umc_debug_display_dimm_sizes(struct amd64_pvt *pvt, u8 ctrl)
>> @@ -3512,9 +3526,10 @@ static void gpu_get_err_info(struct mce *m, struct err_info *err)
>>  static int gpu_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
>>  				    unsigned int cs_mode, int csrow_nr)
>>  {
>> -	u32 addr_mask_orig = pvt->csels[umc].csmasks[csrow_nr];
>> +	u32 addr_mask		= pvt->csels[umc].csmasks[csrow_nr];
>> +	u32 addr_mask_sec	= pvt->csels[umc].csmasks_sec[csrow_nr];
>>  
>> -	return __addr_mask_to_cs_size(addr_mask_orig, cs_mode, csrow_nr, csrow_nr >> 1);
>> +	return __addr_mask_to_cs_size(addr_mask, addr_mask_sec, cs_mode, csrow_nr, csrow_nr >> 1);
>>  }
>>  
>>  static void gpu_debug_display_dimm_sizes(struct amd64_pvt *pvt, u8 ctrl)
>>
>> base-commit: b4d2bada09b17fcd68a0f00811ca7f900ec988e6
> 
> This is a tip branch/commit. However, you should use a 'ras' tree branch,
> since this patch is totally within EDAC.
> 
> Repo:   https://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git
> Branch: edac-for-next
> 
Had initially encountered this on tip kernel and stuck with it.
Will rebase though.

> Besides the process notes, the patch looks good to me.
> 
> Reviewed-by: Yazen Ghannam <yazen.ghannam@....com>
> 
> Thanks,
> Yazen

-- 
Thanks,
Avadhut Naik


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ