lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <28C45B75-7FE3-4C79-9A29-F929AF9BC5A8@nutanix.com>
Date:   Tue, 12 Apr 2022 16:08:32 +0000
From:   Jon Kohler <jon@...anix.com>
To:     Dave Hansen <dave.hansen@...el.com>
CC:     Jon Kohler <jon@...anix.com>, Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        "x86@...nel.org" <x86@...nel.org>,
        "H. Peter Anvin" <hpa@...or.com>, Tony Luck <tony.luck@...el.com>,
        Andi Kleen <ak@...ux.intel.com>,
        Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Borislav Petkov <bp@...e.de>,
        Neelima Krishnan <neelima.krishnan@...el.com>,
        "kvm @ vger . kernel . org" <kvm@...r.kernel.org>
Subject: Re: [PATCH] x86/tsx: fix KVM guest live migration for tsx=on



> On Apr 12, 2022, at 11:54 AM, Dave Hansen <dave.hansen@...el.com> wrote:
> 
> On 4/12/22 06:36, Jon Kohler wrote:
>> So my theory here is to extend the logical effort of the microcode driven
>> automatic disablement as well as the tsx=auto automatic disablement and
>> have tsx=on force abort all transactions on X86_BUG_TAA SKUs, but leave
>> the CPU features enumerated to maintain live migration.
>> 
>> This would still leave TSX totally good on Ice Lake / non-buggy systems.
>> 
>> If it would help, I'm working up an RFC patch, and we could discuss there?
> 
> Sure.  But, it sounds like you really want a new tdx=something rather
> than to muck with tsx=on behavior.  Surely someone else will come along
> and complain that we broke their TDX setup if we change its behavior.

Good point, there will always be a squeaky wheel. I’ll work that into the RFC,
I’ll do something like tsx=compat and see how it shapes up. 

To be fair though, this commit I’m patching with this series would break
setups as they apply 5.14+ and the microcode update, but you have a 
good point for certain.

> 
> Maybe you should just pay the one-time cost and move your whole fleet
> over to tsx=off if you truly believe nobody is using it.
> 

Trust me, I’d love to do that; however:
We’ve thousands of hosts across thousands of unique customers,
which aren't managed as a centralized service (customers manage them directly),
so doing that would require each individual customer to organize a full power
cycle for all of their VMs prior to an upgrade to tsx=off hosts.

That said, we are marching in that direction, we're shipping a control plane
update that will mask HLE and RTM after power cycles, but that requires
customers to apply that control plane update, then power cycle everything. Just
means that we've begun the feature deprecation now, it will take years to fully
bleed off without having customers to micro manage full power cycles.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ