linux-kernel - Re: x86/sgx: uapi change proposal

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20190103150256.GA17015@linux.intel.com>
Date:   Thu, 3 Jan 2019 17:02:56 +0200
From:   Jarkko Sakkinen <jarkko.sakkinen@...ux.intel.com>
To:     Sean Christopherson <sean.j.christopherson@...el.com>
Cc:     Andy Lutomirski <luto@...nel.org>,
        Jethro Beekman <jethro@...tanix.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        "x86@...nel.org" <x86@...nel.org>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Peter Zijlstra <peterz@...radead.org>,
        "H. Peter Anvin" <hpa@...or.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-sgx@...r.kernel.org" <linux-sgx@...r.kernel.org>,
        Josh Triplett <josh@...htriplett.org>,
        Haitao Huang <haitao.huang@...ux.intel.com>,
        "Dr . Greg Wettstein" <greg@...ellic.com>
Subject: Re: x86/sgx: uapi change proposal

On Wed, Jan 02, 2019 at 12:47:52PM -0800, Sean Christopherson wrote:
> On Sat, Dec 22, 2018 at 10:25:02AM +0200, Jarkko Sakkinen wrote:
> > On Sat, Dec 22, 2018 at 10:16:49AM +0200, Jarkko Sakkinen wrote:
> > > On Thu, Dec 20, 2018 at 12:32:04PM +0200, Jarkko Sakkinen wrote:
> > > > On Wed, Dec 19, 2018 at 06:58:48PM -0800, Andy Lutomirski wrote:
> > > > > Can one of you explain why SGX_ENCLAVE_CREATE is better than just
> > > > > opening a new instance of /dev/sgx for each encalve?
> > > > 
> > > > I think that fits better to the SCM_RIGHTS scenario i.e. you could send
> > > > the enclav to a process that does not have necessarily have rights to
> > > > /dev/sgx. Gives more robust environment to configure SGX.
> > > 
> > > Sean, is this why you wanted enclave fd and anon inode and not just use
> > > the address space of /dev/sgx? Just taking notes of all observations.
> > > I'm not sure what your rationale was (maybe it was somewhere). This was
> > > something I made up, and this one is wrong deduction. You can easily
> > > get the same benefit with /dev/sgx associated fd representing the
> > > enclave.
> > > 
> > > This all means that for v19 I'm going without enclave fd involved with
> > > fd to /dev/sgx representing the enclave. No anon inodes will be
> > > involved.
> > 
> > Based on these observations I updated the uapi.
> > 
> > As far as I'm concerned there has to be a solution to do EPC mapping
> > with a sequence:
> > 
> > 1. Ping /dev/kvm to do something.
> > 2. KVM asks SGX core to do something.
> > 3. SGX core does something.
> > 
> > I don't care what the something is exactly is, but KVM is the only sane
> > place for KVM uapi. I would be surprised if KVM maintainers didn't agree
> > that they don't want to sprinkle KVM uapi to random places in other
> > subsystems.
> 
> It's not a KVM uapi.
> 
> KVM isn't a hypervisor in the traditional sense.  The "real" hypervisor
> lives in userspace, e.g. Qemu, KVM is essentially just a (very fancy)
> driver for hardware accelerators, e.g. VMX.  Qemu for example is fully
> capable of running an x86 VM without KVM, it's just substantially slower.
> 
> In terms of guest memory, KVM doesn't care or even know what a particular
> region of memory represents or what, if anything, is backing a region in
> the host.  There are cases when KVM is made aware of certain aspects of
> guest memory for performance or functional reasons, e.g. emulated MMIO
> and encrypted memory, but in all cases the control logic ultimately
> resides in userspace.
> 
> SGX is a weird case because ENCLS can't be emulated in software, i.e.
> exposing SGX to a VM without KVM's help would be difficult.  But, it
> wouldn't be impossible, just slow and ugly.
> 
> And so, ignoring host oversubscription for the moment, there is no hard
> requirement that SGX EPC can only be exposed to a VM through KVM.  In
> other words, allocating and exposing EPC to a VM is orthogonal to KVM
> supporting SGX.  Exposing EPC to userspace via /dev/sgx/epc would mean
> that KVM would handle it like any other guest memory region, and all EPC
> related code/logic would reside in the SGX subsystem.

I'm fine doing that if it makes sense. I just don't understand why you
cannot add ioctls to /dev/kvm for allocating the region. Why isn't that
possible? As I said to Andy earlier, adding new device files is easy as
everything related to device creation is nicely encapsulated.

> Oversubscription throws a wrench in the system because ENCLV can only
> be executed post-VMXON and EPC conflicts generate VMX VM-Exits.  But
> even then, KVM doesn't need to own the EPC uapi, e.g. it can call into
> the SGX subsystem to handle EPC conflict VM-Exits and the SGX subsystem
> can wrap ENCLV with exception fixup and forcefully reclaim EPC pages if
> ENCLV faults.

If the uapi is *only* for KVM, it should definitely own it. KVM calling
SGX subsystem on a conflict is KVM using in-kernel APIs provided by the
SGX core.

> I can't be 100% certain the oversubscription scheme will be sane without
> actually writing the code, but I'd like to at least keep the option open,
> i.e. not structure /dev/sgx/ in such a way that adding e.g. /dev/sgx/epc
> is impossible or ugly.

/Jarkko