lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87r1agtd11.ffs@tglx>
Date:   Mon, 13 Dec 2021 22:23:22 +0100
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Paolo Bonzini <pbonzini@...hat.com>,
        Yang Zhong <yang.zhong@...el.com>, x86@...nel.org,
        kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
        mingo@...hat.com, bp@...en8.de, dave.hansen@...ux.intel.com
Subject: Re: [PATCH 10/19] kvm: x86: Emulate WRMSR of guest IA32_XFD

Paolo,

On Mon, Dec 13 2021 at 20:45, Thomas Gleixner wrote:
> On Mon, Dec 13 2021 at 16:06, Paolo Bonzini wrote:
>> That said, I think xfd_update_state should not have an argument. 
>> current->thread.fpu.fpstate->xfd is the only fpstate that should be 
>> synced with the xfd_state per-CPU variable.
>
> I'm looking into this right now. The whole restore versus runtime thing
> needs to be handled differently.

We need to look at different things here:

   1) XFD MSR write emulation

   2) XFD MSR synchronization when write emulation is disabled

   3) Guest restore

#1 and #2 are in the context of vcpu_run() and

   vcpu->arch.guest_fpu.fpstate == current->thread.fpu.fpstate

while #3 has:

   vcpu->arch.guest_fpu.fpstate != current->thread.fpu.fpstate


#2 is only updating fpstate->xfd and the per CPU shadow.

So the state synchronization wants to be something like this:

void fpu_sync_guest_xfd_state(void)
{
	struct fpstate *fps = current->thread.fpu.fpstate;

	lockdep_assert_irqs_disabled();
	if (fpu_state_size_dynamic()) {
		rdmsrl(MSR_IA32_XFD, fps->xfd);
		__this_cpu_write(xfd_state, fps->xfd);
	}
}
EXPORT_SYMBOL_GPL(fpu_sync_guest_xfd_state);

No wrmsrl() because the MSR is already up do date. The important part is
that fpstate->xfd and the shadow state are updated so that after
reenabling preemption the context switch FPU logic works correctly.


#1 and #3 can trigger a reallocation of guest_fpu.fpstate and
can fail. But this is also true for XSETBV emulation and XCR0 restore.

For #1 modifying fps->xfd in the KVM code before calling into the FPU
code is just _wrong_ because if the guest removes the XFD restriction
then it must be ensured that the buffer is sized correctly _before_ this
is updated.

For #3 it's not really important, but I still try to wrap my head around
the whole picture vs. XCR0.

There are two options:

  1) Require strict ordering of XFD and XCR0 update to avoid pointless
     buffer expansion, i.e. XFD before XCR0.

     Because if XCR0 is updated while guest_fpu->fpstate.xfd is still in
     init state (0) and XCR0 contains extended features, then the buffer
     would be expanded because XFD does not mask the extended features
     out. When XFD is restored with a non-zero value, it's too late
     already.

  2) Ignore buffer expansion up to the point where XSTATE restore happens
     and evaluate guest XCR0 and guest_fpu->fpstate.xfd there.

I'm leaning towards #1 because that means we have exactly _ONE_ place
where we need to deal with buffer expansion. If Qemu gets the ordering
wrong it wastes memory per vCPU, *shrug*.

Thanks,

        tglx




Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ