linux-kernel - Re: [PATCH v9 09/17] x86/split_lock: Handle #AC exception for split lock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c3ff2fb3-4380-fb07-1fa3-15896a09e748@intel.com>
Date:   Wed, 16 Oct 2019 19:23:21 +0800
From:   Xiaoyao Li <xiaoyao.li@...el.com>
To:     Paolo Bonzini <pbonzini@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>
Cc:     Sean Christopherson <sean.j.christopherson@...el.com>,
        Fenghua Yu <fenghua.yu@...el.com>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        H Peter Anvin <hpa@...or.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Dave Hansen <dave.hansen@...el.com>,
        Radim Krcmar <rkrcmar@...hat.com>,
        Ashok Raj <ashok.raj@...el.com>,
        Tony Luck <tony.luck@...el.com>,
        Dan Williams <dan.j.williams@...el.com>,
        Sai Praneeth Prakhya <sai.praneeth.prakhya@...el.com>,
        Ravi V Shankar <ravi.v.shankar@...el.com>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        x86 <x86@...nel.org>, kvm@...r.kernel.org
Subject: Re: [PATCH v9 09/17] x86/split_lock: Handle #AC exception for split
 lock

On 10/16/2019 6:16 PM, Paolo Bonzini wrote:
> On 16/10/19 11:47, Thomas Gleixner wrote:
>> On Wed, 16 Oct 2019, Paolo Bonzini wrote:
>>> Just never advertise split-lock
>>> detection to guests.  If the host has enabled split-lock detection,
>>> trap #AC and forward it to the host handler---which would disable
>>> split lock detection globally and reenter the guest.
>>
>> Which completely defeats the purpose.
> 
> Yes it does.  But Sean's proposal, as I understand it, leads to the
> guest receiving #AC when it wasn't expecting one.  So for an old guest,
> as soon as the guest kernel happens to do a split lock, it gets an
> unexpected #AC and crashes and burns.  And then, after much googling and
> gnashing of teeth, people proceed to disable split lock detection.
> 
> (Old guests are the common case: you're a cloud provider and your
> customers run old stuff; it's a workstation and you want to play that
> game that requires an old version of Windows; etc.).
> 
> To save them the googling and gnashing of teeth, I guess we can do a
> pr_warn_ratelimited on the first split lock encountered by a guest.  (It
> has to be ratelimited because userspace could create an arbitrary amount
> of guests to spam the kernel logs).  But the end result is the same,
> split lock detection is disabled by the user.
> 
> The first alternative I thought of was:
> 
> - Remove KVM loading of MSR_TEST_CTRL, i.e. KVM *never* writes the CPU's
>    actual MSR_TEST_CTRL.  KVM still emulates MSR_TEST_CTRL so that the
>    guest can do WRMSR and handle its own #AC faults, but KVM doesn't
>    change the value in hardware.
> 
> - trap #AC if the guest encounters a split lock while detection is
>    disabled, and then disable split-lock detection in the host.
> 
> But I discarded it because it still doesn't do anything for malicious
> guests, which can trigger #AC as they prefer.  And it makes things
> _worse_ for sane guests, because they think split-lock detection is
> enabled but they become vulnerable as soon as there is only one
> malicious guest on the same machine.
> 
> In all of these cases, the common final result is that split-lock
> detection is disabled on the host.  So might as well go with the
> simplest one and not pretend to virtualize something that (without core
> scheduling) is obviously not virtualizable.

Right, the nature of core-scope makes MSR_TEST_CTL impossible/hard to 
virtualize.

- Making old guests survive needs to disable split-lock detection in 
host(hardware).
- Defending malicious guests needs to enable split-lock detection in 
host(hardware).

We cannot achieve them at the same time.

In my opinion, letting kvm disable the split-lock detection in host is 
not acceptable that it just opens the door for malicious guests to 
attack. I think we can use Sean's proposal like below.

KVM always traps #AC, and only advertises split-lock detection to guest 
when the global variable split_lock_detection_enabled in host is true.

- If guest enables #AC (CPL3 alignment check or split-lock detection 
enabled), injecting #AC back into guest since it's supposed capable of 
handling it.
- If guest doesn't enable #AC, KVM reports #AC to userspace (like other 
unexpected exceptions), and we can print a hint in kernel, or let 
userspace (e.g., QEMU) tell the user guest is killed because there is a 
split-lock in guest.

In this way, malicious guests always get killed by userspace and old 
sane guests cannot survive as well if it causes split-lock. If we do 
want old sane guests work we have to disable the split-lock detection 
(through booting parameter or debugfs) in the host just the same as we 
want to run an old and split-lock generating userspace binary.

But there is an issue that we advertise split-lock detection to guest 
based on the value of split_lock_detection_enabled to be true in host, 
which can be turned into false dynamically when split-lock happens in 
host kernel. This causes guest's capability changes at run time and I 
don't if there is a better way to inform guest? Maybe we need a pv 
interface?

> Thanks,
> 
> Paolo
> 
>> 1) Sane guest
>>
>> Guest kernel has #AC handler and you basically prevent it from
>> detecting malicious user space and killing it. You also prevent #AC
>> detection in the guest kernel which limits debugability.
>>
>> 2) Malicious guest
>>
>> Trigger #AC to disable the host detection and then carry out the DoS
>> attack.
> 
>