linux-kernel - Re: [PATCH v2] x86/cpu: Fix migration safety with X86_BUG_NULL

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bfa45053-1dbf-1cf6-4c6e-f6ec74a50422@citrix.com>
Date:   Mon, 18 Oct 2021 21:06:07 +0100
From:   Andrew Cooper <andrew.cooper3@...rix.com>
To:     Borislav Petkov <bp@...en8.de>,
        Jane Malalane <jane.malalane@...rix.com>
CC:     LKML <linux-kernel@...r.kernel.org>, <x86@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>,
        "H. Peter Anvin" <hpa@...or.com>, Pu Wen <puwen@...on.cn>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Sean Christopherson <seanjc@...gle.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Yazen Ghannam <Yazen.Ghannam@....com>,
        Brijesh Singh <brijesh.singh@....com>,
        Huang Rui <ray.huang@....com>,
        Andy Lutomirski <luto@...nel.org>,
        Kim Phillips <kim.phillips@....com>, <stable@...r.kernel.org>
Subject: Re: [PATCH v2] x86/cpu: Fix migration safety with X86_BUG_NULL_SEL

On 18/10/2021 19:17, Borislav Petkov wrote:
> On Wed, Oct 13, 2021 at 03:22:30PM +0100, Jane Malalane wrote:
>> @@ -650,6 +651,27 @@ static void early_init_amd(struct cpuinfo_x86 *c)
>>  	if (c->x86_power & BIT(14))
>>  		set_cpu_cap(c, X86_FEATURE_RAPL);
>>  
>> +	/*
>> +	 * Zen1 and earlier CPUs don't clear segment base/limits when
>> +	 * loading a NULL selector.  This has been designated
>> +	 * X86_BUG_NULL_SEG.
>> +	 *
>> +	 * Zen3 CPUs advertise Null Selector Clears Base in CPUID.
>> +	 * Zen2 CPUs also have this behaviour, but no CPUID bit.
>> +	 *
>> +	 * A hypervisor may sythesize the bit, but may also hide it
>> +	 * for migration safety, so we must not probe for model
>> +	 * specific behaviour when virtualised.
>> +	 */
>> +	if (c->extended_cpuid_level >= 0x80000021 && cpuid_eax(0x80000021) & BIT(6))
>> +		nscb = true;
>> +
>> +	if (!cpu_has(c, X86_FEATURE_HYPERVISOR) && !nscb && c->x86 == 0x17)
>> +		nscb = check_null_seg_clears_base(c);
>> +
>> +	if (!nscb)
>> +		set_cpu_bug(c, X86_BUG_NULL_SEG);
>> +
>>  #ifdef CONFIG_X86_64
>>  	set_cpu_cap(c, X86_FEATURE_SYSCALL32);
>>  #else
> Can we do something like this?
>
> It is carved out into a separate function which you can simply call from
> early_init_amd() and early_init_hygon().
>
> I guess you can put that function in arch/x86/kernel/cpu/common.c or so.
>
> Then, you should put the comments right over the code like I've done
> below so that one can follow what's going on with each particular check.
>
> I've also flipped the logic a bit and it is simpler this way.

:) Sadly...

>
> Totally untested of course.
>
> static void early_probe_null_seg_clearing_base(struct cpuinfo_x86 *c)
> {
> 	/*
> 	 * A hypervisor may sythesize the bit, but may also hide it
> 	 * for migration safety, so do not probe for model-specific
> 	 * behaviour when virtualised.
> 	 */
> 	if (cpu_has(c, X86_FEATURE_HYPERVISOR))
> 		return;
>
> 	/* Zen3 CPUs advertise Null Selector Clears Base in CPUID. */
> 	if (c->extended_cpuid_level >= 0x80000021 && cpuid_eax(0x80000021) & BIT(6))
> 		return;
>
> 	/* Zen2 CPUs also have this behaviour, but no CPUID bit. */
> 	if (c->x86 == 0x17 && check_null_seg_clears_base(c))
> 		return;

... this is 0x18 for Hygon, and ...

>
> 	/* All the remaining ones are affected */
> 	set_cpu_bug(c, X86_BUG_NULL_SEG);

... hypervisor && !ncsb still needs to set BUG_NULL_SEG, so you really
can't exit early.

> }
>
>> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
>> index 0f8885949e8c..2ca4afb97247 100644
>> --- a/arch/x86/kernel/cpu/common.c
>> +++ b/arch/x86/kernel/cpu/common.c
>> @@ -1395,7 +1395,7 @@ void __init early_cpu_init(void)
>>  	early_identify_cpu(&boot_cpu_data);
>>  }
>>  
>> -static void detect_null_seg_behavior(struct cpuinfo_x86 *c)
>> +bool check_null_seg_clears_base(struct cpuinfo_x86 *c)
>>  {
>>  #ifdef CONFIG_X86_64
>>  	/*
>> @@ -1418,10 +1418,10 @@ static void detect_null_seg_behavior(struct cpuinfo_x86 *c)
>>  	wrmsrl(MSR_FS_BASE, 1);
>>  	loadsegment(fs, 0);
>>  	rdmsrl(MSR_FS_BASE, tmp);
>> -	if (tmp != 0)
>> -		set_cpu_bug(c, X86_BUG_NULL_SEG);
>>  	wrmsrl(MSR_FS_BASE, old_base);
>> +	return tmp == 0;
>>  #endif
>> +	return true;
>>  }
>>  
>>  static void generic_identify(struct cpuinfo_x86 *c)
>> @@ -1457,8 +1457,6 @@ static void generic_identify(struct cpuinfo_x86 *c)
>>  
>>  	get_model_name(c); /* Default name */
>>  
>> -	detect_null_seg_behavior(c);
>> -
>>  	/*
>>  	 * ESPFIX is a strange bug.  All real CPUs have it.  Paravirt
>>  	 * systems that run Linux at CPL > 0 may or may not have the
> So this function is called on all x86 CPUs. Are you sure others besides
> AMD and Hygon do not have the same issue?

No other CPU vendors are known to have this issue.  (And by "issue",
even this is complicated.  Back in the 32bit days, it was a plausible
perf improvement, but it backfired massively for AMD64 where there was a
possibility/expectation to use NULL segments.)

Andy only put the check in unilaterally just in case, and even that was
fine-ish until AMD went and fixed it silently in Zen2.

> IOW, I wouldn't remove that call here.
>
> But then this is the identify() phase in the boot process and you've
> moved it to early_identify() by putting it in the ->c_early_init()
> function pointer on AMD and Hygon.
>
> Is there any particular reasoning for this or can that detection remain
> in ->c_identify()?

No particular reason.  It needs resolving before init gets created, but
any time before that is fine.

> Because if this null seg behavior detection should happen on all
> CPUs - and I think it should, because, well, it has been that way
> until now - then the vendor specific identification minus what
> detect_null_seg_behavior() does should run first and then after
> ->c_identify() is done, you should do something like:
>
> 	if (!cpu_has_bug(c, X86_BUG_NULL_SEG)) {
> 		if (!check_null_seg_clears_base(c))
> 			set_cpu_bug(c, X86_BUG_NULL_SEG);
> 	}
>
> so that it still takes place on all CPUs.

This would only really work for boot cpu and setup_force_cap(), because
no CPU is going to have X86_BUG_NULL_SEG set by default, but this still
misses the point of the bugfix which is "check_null_seg_clears_base()
must not be called when cpu_has_hypervisor".

In practice, the BSP is good enough.  The behaviour predates the K8,
which was the first CPU where it became observable without
SMM/PUSHALL/etc, and quite possibly goes back to the dawn of time, and
you can't mix a Zen1 and Zen2 in a 2-socket system.

>
> I.e., you should split the detection.
>
> I hope I'm making sense ...
>
> Ah, btw, that @c parameter to detect_null_seg_behavior() is unused - you
> should remove it in a pre-patch.

It is made unused by this patch, so can't be pulled out earlier, but
should be adjusted.

~Andrew