lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 26 Jul 2022 12:27:05 +0200
From:   Paolo Bonzini <pbonzini@...hat.com>
To:     Andrei Vagin <avagin@...gle.com>,
        Sean Christopherson <seanjc@...gle.com>
Cc:     linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
        Wanpeng Li <wanpengli@...cent.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Jianfeng Tan <henry.tjf@...fin.com>,
        Adin Scannell <ascannell@...gle.com>,
        Konstantin Bogomolov <bogomolov@...gle.com>,
        Etienne Perot <eperot@...gle.com>,
        Andy Lutomirski <luto@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
        "H. Peter Anvin" <hpa@...or.com>
Subject: Re: [PATCH 0/5] KVM/x86: add a new hypercall to execute host system

On 7/26/22 10:33, Andrei Vagin wrote:
> We can think about restricting the list of system calls that this hypercall can
> execute. In the user-space changes for gVisor, we have a list of system calls
> that are not executed via this hypercall. For example, sigprocmask is never
> executed by this hypercall, because the kvm vcpu has its signal mask.  Another
> example is the ioctl syscall, because it can be one of kvm ioctl-s.

The main issue I have is that the system call addresses are not translated.

On one hand, I understand why it's done like this; it's pretty much 
impossible to do it without duplicating half of the sentry in the host 
kernel.  And the KVM API you're adding is certainly sensible.

On the other hand this makes the hypercall even more specialized, as it 
depends on the guest's memslot layout, and not self-sufficient, in the 
sense that the sandbox isn't secure without prior copying and validation 
of arguments in guest ring0.

> == Host Ring3/Guest ring0 mixed mode ==
> 
> This is how the gVisor KVM platform works right now. We don’t have a separate
> hypervisor, and the Sentry does its functions. The Sentry creates a KVM virtual
> machine instance, sets it up, and handles VMEXITs. As a result, the Sentry runs
> in the host ring3 and the guest ring0 and can transparently switch between
> these two contexts.  In this scheme, the sentry syscall time is 3600ns.
> This is for the case when a system call is called from gr0.
> 
> The benefit of this way is that only a first system call triggers vmexit and
> all subsequent syscalls are executed on the host natively.
> 
> But it has downsides:
> * Each sentry system call trigger the full exit to hr3.
> * Each vmenter/vmexit requires to trigger a signal but it is expensive.
> * It doesn't allow to support Confidential Computing (SEV-ES/SGX). The Sentry
>    has to be fully enclosed in a VM to be able to support these technologies.
> 
> == Execute system calls from a user-space VMM ==
> 
> In this case, the Sentry is always running in VM, and a syscall handler in GR0
> triggers vmexit to transfer control to VMM (user process that is running in
> hr3), VMM executes a required system call, and transfers control back to the
> Sentry. We can say that it implements the suggested hypercall in the
> user-space.
> 
> The sentry syscall time is 2100ns in this case.
> 
> The new hypercall does the same but without switching to the host ring 3. It
> reduces the sentry syscall time to 1000ns.

Yeah, ~3000 clock cycles is what I would expect.

What does it translate to in terms of benchmarks?  For example a simple 
netperf/UDP_RR benchmark.

Paolo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ