linux-kernel - Re: [RESEND RFC 0/2] Paravirtualized Control Register pinning

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <3A97EC6A-B8B4-44E7-89FA-71D3407CB3D7@oracle.com>
Date:   Wed, 25 Dec 2019 15:05:23 +0200
From:   Liran Alon <liran.alon@...cle.com>
To:     Andy Lutomirski <luto@...nel.org>
Cc:     John Andersen <john.s.andersen@...el.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        X86 ML <x86@...nel.org>, Paolo Bonzini <pbonzini@...hat.com>,
        "H. Peter Anvin" <hpa@...or.com>,
        "Christopherson, Sean J" <sean.j.christopherson@...el.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Joerg Roedel <joro@...tes.org>,
        LKML <linux-kernel@...r.kernel.org>,
        kvm list <kvm@...r.kernel.org>
Subject: Re: [RESEND RFC 0/2] Paravirtualized Control Register pinning



> On 25 Dec 2019, at 4:04, Andy Lutomirski <luto@...nel.org> wrote:
> 
> On Mon, Dec 23, 2019 at 6:31 AM Liran Alon <liran.alon@...cle.com> wrote:
>> 
>> 
>> 
>>> On 20 Dec 2019, at 21:26, John Andersen <john.s.andersen@...el.com> wrote:
>>> 
>>> Paravirtualized Control Register pinning is a strengthened version of
>>> existing protections on the Write Protect, Supervisor Mode Execution /
>>> Access Protection, and User-Mode Instruction Prevention bits. The
>>> existing protections prevent native_write_cr*() functions from writing
>>> values which disable those bits. This patchset prevents any guest
>>> writes to control registers from disabling pinned bits, not just writes
>>> from native_write_cr*(). This stops attackers within the guest from
>>> using ROP to disable protection bits.
>>> 
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__web.archive.org_web_20171029060939_http-3A__www.blackbunny.io_linux-2Dkernel-2Dx86-2D64-2Dbypass-2Dsmep-2Dkaslr-2Dkptr-5Frestric_&d=DwIDAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=Jk6Q8nNzkQ6LJ6g42qARkg6ryIDGQr-yKXPNGZbpTx0&m=-H3SsRpu0sEBqqn9-OOVimBDXk6TimcJerlu4-ko5Io&s=TrjU4_UEZIoYjxtoXcjsA8Riu0QZ8eI7a4fH96hSBQc&e=
>>> 
>>> The protection is implemented by adding MSRs to KVM which contain the
>>> bits that are allowed to be pinned, and the bits which are pinned. The
>>> guest or userspace can enable bit pinning by reading MSRs to check
>>> which bits are allowed to be pinned, and then writing MSRs to set which
>>> bits they want pinned.
>>> 
>>> Other hypervisors such as HyperV have implemented similar protections
>>> for Control Registers and MSRs; which security researchers have found
>>> effective.
>>> 
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.abatchy.com_2018_01_kernel-2Dexploitation-2D4&d=DwIDAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=Jk6Q8nNzkQ6LJ6g42qARkg6ryIDGQr-yKXPNGZbpTx0&m=-H3SsRpu0sEBqqn9-OOVimBDXk6TimcJerlu4-ko5Io&s=Fg3e-BSUebNg44Ocp_y19xIoK0HJEHPW2AgM958F3Uc&e=
>>> 
>> 
>> I think it’s important to mention how Hyper-V implements this protection as it is done in a very different architecture.
>> 
>> Hyper-V implements a set of PV APIs named VSM (Virtual Secure Mode) aimed to allow a guest (partition) to separate itself to multiple security domains called VTLs (Virtual Trust Level).
>> The VSM API expose an interface to higher VTLs to control the execution of lower VTLs. In theory, VSM supports up to 16 VTLs, but Windows VBS (Virtualization Based Security) that is
>> the only current technology which utilise VSM, use only 2 VTLs. VTL0 for most of OS execution (Normal-Mode) and VTL1 for a secure OS execution (Secure-Mode).
>> 
>> Higher VTL controls execution of lower VTL by the following VSM mechanisms:
>> 1) Memory Access Protections: Allows higher VTL to restrict memory access to physical pages. Either making them inaccessible or limited to certain permissions.
>> 2) Secure Intercepts: Allows a higher VTL to request hypervisor to intercept certain events in lower VTLs for handling by higher VTL. This includes access to system registers (e.g. CRs & MSRs).
>> 
>> VBS use above mentioned mechanisms as follows:
>> a) Credentials Guard: Prevents pass-the-hash attacks. Done by encrypting credentials using a VTL1 trustlet to encrypt them by an encryption-key stored in VTL1-only accessible memory.
>> b) HVCI (Hypervisor-based Code-Integrity): Prevents execution of unsigned code. Done by marking all EPT entries with NX until signature verified by VTL1 service. Once verified, mark EPT entries as RO+X.
>> (HVCI also supports enforcing code-signing only on Ring0 code efficiently by utilising Intel MBEC or AMD GMET CPU features. Which allows setting NX-bit on EPT entries based on guest CPL).
>> c) KDP (Kernel Data Protection): Marks certain pages after initialisation as read-only on VTL0 EPT.
>> d) kCFG (Kernel Control-Flow Guard): VTL1 protects bitmap,specifying valid indirect branch targets, by protecting it with read-only on VTL0 EPT.
>> e) HyperGuard: VTL1 use “Secure Intercepts” mechanism to prevent VTL0 from modifying important system registers. Including CR0 & CR4 as done by this patch.
>>    HyperGuard also implements a mechanism named NPIEP (Non-Privileged Instruction Execution Prevention) that prevents VTL0 Ring3 executing SIDT/SGDT/SLDT/STR to leak Ring0 addresses.
>> 
>> To sum-up, In Hyper-V, the hypervisor expose a relatively thin API to allow guest to partition itself to multiple security domains (enforced by virtualization).
>> Using this framework, it’s possible to implement multiple OS-level protection mechanisms. Only one of them are pinning certain registers to specific values as done by this patch.
>> 
>> Therefore, as I also tried to say in recent KVM Forum, I think KVM should consider exposing a VSM-like API to guest to allow various guest OS,
>> Including Linux, to implement VBS-like features. To decide on how this API should look like, we need to have a more broad discussion with Linux
>> Security maintainers and KVM maintainers on which security features we would like to implement using such API and what should be their architecture.
>> Then, we can implement this API in KVM and start to gradually introduce more security features in Linux which utilise this API.
> 
> How about having KVM implement the VSM API directly?

Hyper-V VSM API is tightly coupled to the rest of Hyper-V PV interface. Therefore, KVM could only implement VSM API as-is
as part of it’s Hyper-V PV interface emulation implementation. Because we don’t wish to expose Hyper-V PV interface by default
to all KVM guests, KVM should have it’s own variant providing similar capabilities.

In addition, in my opinion there are some bad design choices in VSM API I haven’t mentioned in my previous message. Which KVM
VSM-like API would maybe want to do differently to avoid those mistakes. For example, VSM API by design assumes that a given VTL
is superior and control every aspect of all VTLs lower than it. In Windows VBS, this have caused securekernel (Sk) running in VTL1 to
be part of TCB and therefore significantly enlarge it. In contrast, for example, to QubesOS where OS is split to security-domains that
each have well-defined capabilities but none have full capabilities as VTL1 have in VBS. Therefore, it preserves only the hypervisor
in TCB as it should.

Having said that, I am already working on a patch-series to enhance KVM Hyper-V PV interface implementation to also include VSM.
As I have mentioned in recent KVM Forum, I wish to do so to make modern Windows OS with VBS support running on top of KVM,
to not need to run Hyper-V inside the KVM guest (i.e. Leading to nested-virtualization workload). When Windows detect it’s already
running as a guest on top of Hyper-V with VSM support, it uses underlying hypervisor VSM API to implement VBS. Without loading
Hyper-V inside the guest. Therefore, improving performance & semantics of Windows VBS guests on top of KVM.

Note though, that my work of implementing VSM in KVM Hyper-V PV interface implementation isn’t related to the discussion here.
Which is: How should Linux be modified to take advantage of a VSM-like API to implement security mitigations features as those
I described above that Windows VBS implement on top of such API. Deciding on the design of those features, will also guideline
what should be the KVM PV VSM-like API we should implement.

-Liran