lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <29e54919-edff-41ba-a3d0-d400e36fa6b9@intel.com>
Date: Wed, 8 May 2024 12:11:02 -0700
From: Dave Hansen <dave.hansen@...el.com>
To: "Chang S. Bae" <chang.seok.bae@...el.com>, linux-kernel@...r.kernel.org
Cc: x86@...nel.org, platform-driver-x86@...r.kernel.org, tglx@...utronix.de,
 mingo@...hat.com, bp@...en8.de, dave.hansen@...ux.intel.com,
 hdegoede@...hat.com, ilpo.jarvinen@...ux.intel.com, tony.luck@...el.com,
 ashok.raj@...el.com, jithu.joseph@...el.com
Subject: Re: [PATCH v2 1/2] x86/fpu: Extend kernel_fpu_begin_mask() to
 initialize AMX state

On 5/8/24 11:03, Chang S. Bae wrote:
> On 5/8/2024 7:40 AM, Dave Hansen wrote:
>> On 5/7/24 16:53, Chang S. Bae wrote:
>>
>>> However, due to resource constraints in storage, AMX state is excluded
>>> from the scope of state recovery. Consequently, AMX state must be in its
>>> initialized state for the IFS test to run.
>>
>> This doesn't mention how this issue got introduced.  Are we all bad at
>> reading the SDM? :)
> 
> Ah, I'd rather zap out this SDM sentence.

My point is that this is fixing a bug.  Where did that bug come from?
What got screwed up here?

Hint: I don't think us software folks screwed up here.  It was likely
the folks that built the two hardware features (AMX and IFS) forgot to
talk to each other, or someone forgot to document the AMX clobbering
aspect of the architecture.

>>> When AMX workloads are running, an active user AMX state remains even
>>> after a context switch, optimizing to reduce the state reload cost. In
>>> such cases, the test cannot proceed if it is scheduled.
>>
>> This is a bit out of the blue.  What does scheduling have do do with IFS?
..
> So, the CPU stopper threads for <cpu#> and its sibling to execute
> doscan() are queued up with the highest priority.
..

But this is the IFS implementation *today*.  The explanation depends on
IFS being implemented with something that context switches.  It also
depends on folks expecting context switches to always switch FPU state.

I'd just say:

	The kernel generally runs with live user FPU state, including
	AMX. That state can prevent IFS tests from running.

That's _much_ more simple, generic and also fully explains the
situation.  It also isn't dependent on the IFS stop_cpus_run()
implementation of today, which could totally change tomorrow.

The underlying rule has zero to do with scheduling or context switching
optimizations.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ