lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 12 Apr 2022 13:36:20 +0000
From:   Jon Kohler <jon@...anix.com>
To:     Dave Hansen <dave.hansen@...el.com>
CC:     Jon Kohler <jon@...anix.com>, Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        "x86@...nel.org" <x86@...nel.org>,
        "H. Peter Anvin" <hpa@...or.com>, Tony Luck <tony.luck@...el.com>,
        Andi Kleen <ak@...ux.intel.com>,
        Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Borislav Petkov <bp@...e.de>,
        Neelima Krishnan <neelima.krishnan@...el.com>,
        "kvm @ vger . kernel . org" <kvm@...r.kernel.org>
Subject: Re: [PATCH] x86/tsx: fix KVM guest live migration for tsx=on



> On Apr 11, 2022, at 7:45 PM, Dave Hansen <dave.hansen@...el.com> wrote:
> 
> On 4/11/22 12:35, Jon Kohler wrote:
>> Also, while I’ve got you, I’d also like to send out a patch to simply
>> force abort all transactions even when tsx=on, and just be done with
>> TSX. Now that we’ve had the patch that introduced this functionality
>> I’m patching for roughly a year, combined with the microcode going
>> out, it seems like TSX’s numbered days have come to an end. 
> 
> Could you elaborate a little more here?  Why would we ever want to force
> abort transactions that don't need to be aborted for some reason?

Sure, I'm talking specifically about when users of tsx=on (or
CONFIG_X86_INTEL_TSX_MODE_ON) on X86_BUG_TAA CPU SKUs. In this situation,
TSX features are enabled, as are TAA mitigations. Using our own use case
as an example, we only do this because of legacy live migration reasons.

This is fine on Skylake (because we're signed up for MDS mitigation anyhow)
and fine on Ice Lake because TAA_NO=1; however this is wicked painful on
Cascade Lake, because MDS_NO=1 and TAA_NO=0, so we're still signed up for
TAA mitigation by default. On CLX, this hits us on host syscalls as well as
vmexits with the mds clear on every one :(

So tsx=on is this oddball for us, because if we switch to auto, we'll break
live migration for some of our customers (but TAA overhead is gone), but
if we leave tsx=on, we keep the feature enabled (but no one likely uses it)
and still have to pay the TAA tax even if a customer doesn't use it.

So my theory here is to extend the logical effort of the microcode driven
automatic disablement as well as the tsx=auto automatic disablement and
have tsx=on force abort all transactions on X86_BUG_TAA SKUs, but leave
the CPU features enumerated to maintain live migration.

This would still leave TSX totally good on Ice Lake / non-buggy systems.

If it would help, I'm working up an RFC patch, and we could discuss there?

In the mean time, I did send out a v2 patch for this series addressing your
comments.

Thanks again,
Jon

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ