lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zjj3Lrv2NNHLEzzk@google.com>
Date: Mon, 6 May 2024 08:28:46 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Oliver Sang <oliver.sang@...el.com>
Cc: Borislav Petkov <bp@...en8.de>, oe-lkp@...ts.linux.dev, lkp@...el.com, 
	linux-kernel@...r.kernel.org, x86@...nel.org, Ingo Molnar <mingo@...nel.org>, 
	Srikanth Aithal <sraithal@....com>
Subject: Re: [tip:x86/alternatives] [x86/alternatives] ee8962082a: WARNING:at_arch/x86/kernel/cpu/cpuid-deps.c:#do_clear_cpu_cap

On Mon, May 06, 2024, Oliver Sang wrote:
> hi, Boris,
> 
> On Sat, May 04, 2024 at 02:48:22PM +0200, Borislav Petkov wrote:
> > On Wed, May 01, 2024 at 12:33:05AM +0200, Borislav Petkov wrote:
> > > On Tue, Apr 30, 2024 at 12:51:02PM -0700, Sean Christopherson wrote:
> > > > But that would just mask the underlying problem, it wouldn't actually fix anything
> > > > other than making the WARN go away.  Unless I'm misreading the splat+code, the
> > > > issue isn't that init_ia32_feat_ctl() clears VMX late, it's that the BSP sees
> > > > VMX as fully enabled, but at least one AP sees VMX as disabled.
> > > > 
> > > > I don't see how the kernel can expect to function correctly with divergent feature
> > > > support across CPUs, i.e. the WARN is a _good_ thing in this case, because it
> > > > alerts the user that their system is messed up, e.g. has a bad BIOS or something.
> > > 
> > > Yes, and yes.
> > > 
> > > There are two issues. Clearing feature flags after alternatives have
> > > been applied should not happen, and this particular issue with that box.
> > > 
> > > Lemme cook up something in the coming days for the former.
> > 
> > Two simple patches as a reply to this.
> > 
> > Oliver, can you run them on your box pls?
> 
> we confirmed after applying them upon ee8962082a, the WARNING which was reported
> in our original report cannot be reproduced any longer.

I am so confused.  With both patches applied, simulating VMX being disabled by
BIOS, which is a _legal_ configuration, yields:

 ------------[ cut here ]------------
 WARNING: CPU: 1 PID: 0 at arch/x86/kernel/cpu/cpuid-deps.c:118 do_clear_cpu_cap+0xf6/0x120
 Modules linked in:
 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.9.0-rc3-28ed6849f6ae-rev/boris-vm #389
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
 RIP: 0010:do_clear_cpu_cap+0xf6/0x120
 Call Trace:
  <TASK>
  init_ia32_feat_ctl+0x133/0x420
  init_intel+0x11/0x330
  identify_cpu+0x242/0x670
  identify_secondary_cpu+0xe/0x40
  smp_store_cpu_info+0x48/0x60
  start_secondary+0x73/0x120
  common_startup_64+0x13b/0x141
  </TASK>
 ---[ end trace 0000000000000000 ]---

That's completely expected (by me at least), because as I said in the original
thread, secondary CPUs are identified after alternatives are applied, and when VMX
is disabled by BIOS, the feature flag will be initially set based on CPUID, and
then cleared by init_ia32_feat_ctl().  I.e. patch 1 is wrong on multiple levels.

And without _either_ patch applied, no WARN occurs, which is again expected (by
me), because init_ia32_feat_ctl() runs _before_ alternative_instructions() on the
boot CPU, i.e. alternatives_patched will be false when do_clear_cpu_cap() is
called by the boot CPU, and the boot_cpu_has(feature) that guards the WARN will
be false when do_clear_cpu_cap() is called by secondary CPUs.

The only way the WARN could have fired without this series is if VMX is enabled
in BIOS on the boot CPU, but disabled by BIOS on one more secondary CPUs.  And
_that_ is a bogus setup that (a) the kernel absolutely should WARN about, and
(b) _still_ occurs with one or both patches applied.

So I don't see how this series could possibly have fixed the issue Oliver
encountered, nor do I see any value in moving init_ia32_feat_ctl() into
early_init_intel().

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ