linux-kernel - Re: [RFD] x86/split_lock: Request to Intel

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20191017172312.GC20903@linux.intel.com>
Date:   Thu, 17 Oct 2019 10:23:12 -0700
From:   Sean Christopherson <sean.j.christopherson@...el.com>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     Paolo Bonzini <pbonzini@...hat.com>,
        Xiaoyao Li <xiaoyao.li@...el.com>,
        Fenghua Yu <fenghua.yu@...el.com>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        H Peter Anvin <hpa@...or.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Dave Hansen <dave.hansen@...el.com>,
        Radim Krcmar <rkrcmar@...hat.com>,
        Ashok Raj <ashok.raj@...el.com>,
        Tony Luck <tony.luck@...el.com>,
        Dan Williams <dan.j.williams@...el.com>,
        Sai Praneeth Prakhya <sai.praneeth.prakhya@...el.com>,
        Ravi V Shankar <ravi.v.shankar@...el.com>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        x86 <x86@...nel.org>, kvm@...r.kernel.org
Subject: Re: [RFD] x86/split_lock: Request to Intel

On Thu, Oct 17, 2019 at 02:29:45PM +0200, Thomas Gleixner wrote:
> The more I look at this trainwreck, the less interested I am in merging any
> of this at all.
> 
> The fact that it took Intel more than a year to figure out that the MSR is
> per core and not per thread is yet another proof that this industry just
> works by pure chance.
> 
> There is a simple way out of this misery:
> 
>   Intel issues a microcode update which does:
> 
>     1) Convert the OR logic of the AC enable bit in the TEST_CTRL MSR to
>        AND logic, i.e. when one thread disables AC it's automatically
>        disabled on the core.
> 
>        Alternatively it supresses the #AC when the current thread has it
>        disabled.
> 
>     2) Provide a separate bit which indicates that the AC enable logic is
>        actually AND based or that #AC is supressed when the current thread
>        has it disabled.
> 
>     Which way I don't really care as long as it makes sense.

The #AC bit doesn't use OR-logic, it's straight up shared, i.e. writes on
one CPU are immediately visible on its sibling CPU.  It doesn't magically
solve the problem, but I don't think we need IPIs to coordinate between
siblings, e.g. wouldn't something like this work?  The per-cpu things
being pointers that are shared by siblings.

void split_lock_disable(void)
{
        spinlock_t *ac_lock = this_cpu_ptr(split_lock_ac_lock);

	spin_lock(ac_lock);
        if (this_cpu_inc_return(*split_lock_ac_disabled) == 1)
                WRMSR(RDMSR() & ~bit);
        spin_unlock(ac_lock);
}

void split_lock_enable(void)
{
        spinlock_t *ac_lock = this_cpu_ptr(split_lock_ac_lock);

	spin_lock(ac_lock);
        if (this_cpu_dec_return(*split_lock_ac_disabled) == 0)
                WRMSR(RDMSR() | bit);
        spin_unlock(ac_lock);
}


To avoid the spin_lock and WRMSR latency on every VM-Enter and VM-Exit,
actions (3a) and (4a) from your matrix (copied below) could be changed to
only do split_lock_disable() if the guest actually generates an #AC, and
then do split_lock_enable() on the next VM-Exit.  Assuming even legacy
guests are somewhat sane and rarely do split-locks, lazily disabling the
control would eliminate most of the overhead and would also reduce the
time that the sibling CPU is running in the host without #AC protection.


N | #AC       | #AC enabled | SMT | Ctrl    | Guest | Action
R | available | on host     |     | exposed | #AC   |
--|-----------|-------------|-----|---------|-------|---------------------
  |           |             |     |         |       |
0 | N         |     x       |  x  |   N     |   x   | None
  |           |             |     |         |       |
1 | Y         |     N       |  x  |   N     |   x   | None
  |           |             |     |         |       |
2 | Y         |     Y       |  x  |   Y     |   Y   | Forward to guest
  |           |             |     |         |       |
3 | Y         |     Y       |  N  |   Y     |   N   | A) Store in vCPU and
  |           |             |     |         |       |    toggle on VMENTER/EXIT
  |           |             |     |         |       |
  |           |             |     |         |       | B) SIGBUS or KVM exit code
  |           |             |     |         |       |
4 | Y         |     Y       |  Y  |   Y     |   N   | A) Disable globally on
  |           |             |     |         |       |    host. Store in vCPU/guest
  |           |             |     |         |       |    state and evtl. reenable
  |           |             |     |         |       |    when guest goes away.
  |           |             |     |         |       | 
  |           |             |     |         |       | B) SIGBUS or KVM exit code


> If that's not going to happen, then we just bury the whole thing and put it
> on hold until a sane implementation of that functionality surfaces in
> silicon some day in the not so foreseeable future.
> 
> Seriously, this makes only sense when it's by default enabled and not
> rendered useless by VIRT. Otherwise we never get any reports and none of
> the issues are going to be fixed.
> 
> Thanks,
> 
> 	tglx