lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 10 Jan 2019 08:04:49 +0000
From:   S@...r.kernel.org
To:     Borislav Petkov <bp@...en8.de>, "S, Shirish" <Shirish.S@....com>
CC:     Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>,
        "H . Peter Anvin" <hpa@...or.com>,
        "maintainer : X86 ARCHITECTURE" <x86@...nel.org>,
        Tony Luck <tony.luck@...el.com>,
        Vishal Verma <vishal.l.verma@...el.com>,
        "open list : X86 ARCHITECTURE" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 2/2] x86/mce/amd: Ensure quirks are applied in resume path
 as well

Hi Borislav,

On 1/10/2019 3:38 AM, Borislav Petkov wrote:
> On Mon, Jan 07, 2019 at 10:37:24AM +0000, S, Shirish wrote:
>> This patch adds threshold quirk applicable for family 15
> Same issue with "This patch" here.
Agree, have avoided the phrase in both patches that originate from this 
discussion.
>
>> in resume path as well, since mce_amd_feature_init()
>> does not have quirks applied when originating from mce_syscore_resume(),
>> resulting in the below message at every successful resume:
>>
>> "[Firmware Bug]: cpu 0, invalid threshold interrupt offset ..."
>>
>> Signed-off-by: Shirish S <shirish.s@....com>
>> ---
>>   arch/x86/kernel/cpu/mce/amd.c | 34 ++++++++++++++++++++++++++++++++++
>>   1 file changed, 34 insertions(+)
>>
>> diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
>> index 89298c8..27cbf66 100644
>> --- a/arch/x86/kernel/cpu/mce/amd.c
>> +++ b/arch/x86/kernel/cpu/mce/amd.c
>> @@ -545,6 +545,34 @@ prepare_threshold_block(unsigned int bank, unsigned int block, u32 addr,
>>   	return offset;
>>   }
>>   
>> +void disable_err_thresholding(struct cpuinfo_x86 *c)
>> +{
>> +	int i;
>> +	u64 hwcr;
>> +	bool need_toggle;
>> +	u32 msrs[] = {
>> +		0x00000413, /* MC4_MISC0 */
>> +		0xc0000408, /* MC4_MISC1 */
>> +	};
>> +
>> +	if (c->x86_model >= 0x10 && c->x86_model <= 0x7f) {
> You can save yourself an indentation level by reversing the logic here:
>
> 	if (c->x86 != 0x15)
> 		return;
>
> Also, I'm wondering if you simply can't do
>
> 	if (c->x86_model < 0x10)
> 		return;
>
> The assumption being that all the models - even after 0x7f - are highly
> unlikely to get MC4_MISC thresholding supported, all of a sudden. Might
> wanna run it by HW guys first though.

Perhaps no need to check the model altogether, this is applicable to 
entire family 15, since there is no 8th Gen.

>
>> +		rdmsrl(MSR_K7_HWCR, hwcr);
>> +
>> +		/* McStatusWrEn has to be set */
>> +		need_toggle = !(hwcr & BIT(18));
>> +
>> +		if (need_toggle)
>> +			wrmsrl(MSR_K7_HWCR, hwcr | BIT(18));
>> +
>> +		/* Clear CntP bit safely */
>> +		for (i = 0; i < ARRAY_SIZE(msrs); i++)
>> +			msr_clear_bit(msrs[i], 62);
>> +
>> +		/* restore old settings */
>> +		if (need_toggle)
>> +			wrmsrl(MSR_K7_HWCR, hwcr);
>> +	}
> So you copied the same code from __mcheck_cpu_apply_quirks().
>
> No.
>
> In a first patch, you carve that CntP clearing code in a separate
> function disable_err_thresholding() like you've done before.
Agree, first patch here:  https://lkml.org/lkml/2019/1/10/70
> Then, in a second patch, you call it from mce/amd.c and you move all the
> family/model checks inside the function so that you have a sole
>
> 	disable_err_thresholding();
>
> calls where you need them.
>
> You don't have to pass in struct cpuinfo_x86 *c - you can use
> boot_cpu_data in the function.
Agree, second patch here: https://lkml.org/lkml/2019/1/10/72
>
>> +}
>>   /* cpu init entry point, called from mce.c with preempt off */
>>   void mce_amd_feature_init(struct cpuinfo_x86 *c)
>>   {
>> @@ -552,6 +580,12 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c)
>>   	unsigned int bank, block, cpu = smp_processor_id();
>>   	int offset = -1;
>>   
>> +	/* Disable error thresholding bank in S3 resume path as well,
> What S3 resume path? That's the CPU init path.

I have also put in the commit message, but for discussion sake, there 
are 2 code paths leading to mce_amd_feature_init() as below.
1) S5 -> S0: (boot)
secondary_startup_64 -> start_kernel -> identify_boot_cpu ->
identify_cpu ->
mcheck_cpu_init (calls  __mcheck_cpu_apply_quirks before) ->
mce_amd_feature_init

2) S3 -> S0: (resume)
syscore_resume -> mce_syscore_resume -> mce_amd_feature_init

I have summarized this sequence after capturing the call trace in both 
scenarios.

Its clear that the quirks are not applied during S3-> S0 transition, 
hence the patch.

I cannot move the quirk to mce_amd_feature_init() since amd.c is guarded 
by CONFIG_X86_MCE_AMD and the quirk should be applicable in case the 
config is off via

__mcheck_cpu_apply_quirks() at boot.

> Also, kernel comments style is:
>
> 	/*
> 	 * A sentence ending with a full-stop.
> 	 * Another sentence. ...
> 	 * More sentences. ...
> 	 */
>
Agree, followed it in the above mentioned patches.

Thanks.

Regards,

Shirish S

> Good luck!
>
> :-)
>
-- 
Regards,
Shirish S

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ