[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200921114104.GB6038@linux.intel.com>
Date: Mon, 21 Sep 2020 14:41:04 +0300
From: Jarkko Sakkinen <jarkko.sakkinen@...ux.intel.com>
To: Sean Christopherson <sean.j.christopherson@...el.com>
Cc: Haitao Huang <haitao.huang@...ux.intel.com>, x86@...nel.org,
linux-sgx@...r.kernel.org, linux-kernel@...r.kernel.org,
Jethro Beekman <jethro@...tanix.com>,
Chunyang Hui <sanqian.hcy@...fin.com>,
Jordan Hand <jorhand@...ux.microsoft.com>,
Nathaniel McCallum <npmccallum@...hat.com>,
Seth Moore <sethmo@...gle.com>,
Darren Kenny <darren.kenny@...cle.com>,
Suresh Siddha <suresh.b.siddha@...el.com>,
akpm@...ux-foundation.org, andriy.shevchenko@...ux.intel.com,
asapek@...gle.com, bp@...en8.de, cedric.xing@...el.com,
chenalexchen@...gle.com, conradparker@...gle.com,
cyhanish@...gle.com, dave.hansen@...el.com, haitao.huang@...el.com,
josh@...htriplett.org, kai.huang@...el.com, kai.svahn@...el.com,
kmoy@...gle.com, ludloff@...gle.com, luto@...nel.org,
nhorman@...hat.com, puiterwijk@...hat.com, rientjes@...gle.com,
tglx@...utronix.de, yaozhangx@...gle.com
Subject: Re: [PATCH v38 13/24] x86/sgx: Add SGX_IOC_ENCLAVE_ADD_PAGES
On Fri, Sep 18, 2020 at 05:09:19PM -0700, Sean Christopherson wrote:
> On Fri, Sep 18, 2020 at 03:39:32PM +0300, Jarkko Sakkinen wrote:
> > On Fri, Sep 18, 2020 at 03:20:39PM +0300, Jarkko Sakkinen wrote:
> > > On Thu, Sep 17, 2020 at 07:09:40PM -0700, Sean Christopherson wrote:
> > > > On Thu, Sep 17, 2020 at 01:35:10PM -0500, Haitao Huang wrote:
> > > > > On Thu, 17 Sep 2020 11:02:06 -0500, Jarkko Sakkinen
> > > > > <jarkko.sakkinen@...ux.intel.com> wrote:
> > > > > >
> > > > > > Right, I do get the OOM case but wouldn't in that case the reasonable
> > > > > > thing to do destroy the enclave that is not even running? I mean that
> > > > > > means that we are globally out of EPC.
> > > > > >
> > > > >
> > > > > I would say it could be a policy, but not the only one. If it does not make
> > > > > much difference to kernel, IMHO we should not set it in stone now.
> > > > > Debugging is also huge benefit to me.
> > > >
> > > > Agreed, an EPC cgroup is the proper way to define/enforce what happens when
> > > > there is EPC pressure. E.g. if process A is consuming 99% of the EPC, then
> > > > it doesn't make sense to unconditionally kill enclaves from process B. If
> > > > the admin wants to give process A priority, so be it, but such a decision
> > > > shouldn't be baked into the kernel.
> > > >
> > > > This series obviously doesn't provide an EPC cgroup, but that doesn't mean
> > > > we can't make decisions that will play nice with a cgroup in the future.
> > >
> > > Here's the core issue why the API "as is used to be" does not work:
> > >
> > > if (ret == -EIO) {
> > > mutex_lock(&encl->lock);
> > > sgx_encl_destroy(encl);
> > > mutex_unlock(&encl->lock);
> > > }
> > >
> > > It would be better to instead whitelist *when* the enclave is preserved.
> > >
> > > if (ret != -ENOMEM) {
> > > mutex_lock(&encl->lock);
> > > sgx_encl_destroy(encl);
> > > mutex_unlock(&encl->lock);
> > > }
> > >
> > > That is the information we *deterministically* want to know. Otherwise,
> > > we will live in ultimate chaos.
> > >
> > > Only this way can caller know when there are means to continue, and when
> > > to quit. I.e. the code is whitelisting wrong way around currently.
> >
> > Actually since the state cannot corrupt unless EADD or EEXTEND fails it
> > is fine to have the enclave alive on any other error condition. I think
>
> EADD and EEXTEND failure don't corrupt state. Killing the enclave if EEXTEND
> fails makes sense because failure at that point is either due to a kernel bug
> or loss of EPC, both of which are fatal to the enclave.
This is also true. I meant by corrupt state e.g. a kernel bug, which
causes uninitalizes pages go the free queue.
I'd rephrase this in kdoc as: "The function deinitializes enclave and
returns -EIO when EPC is lost, while entering to a new power cycle".
Documentation describes only legit behaviour, let's ignore the corrupt
part.
> EADD is a little different, e.g. it could fault due to a bad source address,
> in which case the failure is not technically fatal. But, Jarkko wanted to
> have consistent behavior for EADD and EEXTEND failures, and practically
> speaking the enclave is probably hosed anyways if EADD fails, i.e. killing the
> enclave on EADD failure isn't a sticking point (for me).
We need to figure out own return value for EADD, but I agree with this.
I would go with -EFAULT as we do when source VMA is no available. Does
this make sense to you?
/Jarkko
Powered by blists - more mailing lists