linux-kernel - Re: [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20210323121643.e06403a1bc7819bab7c15d95@intel.com>
Date:   Tue, 23 Mar 2021 12:16:43 +1300
From:   Kai Huang <kai.huang@...el.com>
To:     Borislav Petkov <bp@...en8.de>
Cc:     Sean Christopherson <seanjc@...gle.com>, kvm@...r.kernel.org,
        x86@...nel.org, linux-sgx@...r.kernel.org,
        linux-kernel@...r.kernel.org, jarkko@...nel.org, luto@...nel.org,
        dave.hansen@...el.com, rick.p.edgecombe@...el.com,
        haitao.huang@...el.com, pbonzini@...hat.com, tglx@...utronix.de,
        mingo@...hat.com, hpa@...or.com
Subject: Re: [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from
 sgx_free_epc_page()

On Mon, 22 Mar 2021 23:37:26 +0100 Borislav Petkov wrote:
> On Tue, Mar 23, 2021 at 11:06:43AM +1300, Kai Huang wrote:
> > This path is called by host SGX driver only, so yes this leaking is done by
> > host enclaves only.
> 
> Yes, so I was told.
> 
> > This patch is purpose is to break EREMOVE out of sgx_free_epc_page() so virtual
> > EPC code can use sgx_free_epc_page(), and handle EREMOVE logic differently.
> > This patch doesn't have intention to bring functional change.  I changed the
> > error msg because Dave said it worth to give a message saying EPC page is
> > leaked, and I thought this minor change won't break anything.
> 
> I read that already - you don't have to repeat it.
> 
> > btw, currently virtual EPC code (patch 5) handles in similar way: There's one
> > EREMOVE error is expected and virtual EPC code can handle, but for other
> > errors, it means kernel bug, and virtual EPC code gives a WARN(), and that EPC
> > page is leaked too:
> > 
> > +		WARN_ONCE(ret != SGX_CHILD_PRESENT,
> > +			  "EREMOVE (EPC page 0x%lx): unexpected error: %d\n",
> > +			  sgx_get_epc_phys_addr(epc_page), ret);
> > +		return ret;
> > 
> > So to me they are just WARN() to catch kernel bug.
> 
> You don't care about users, do you? Because when that error happens,
> they won't come crying to you to fix it.
> 
> Lemme save you some trouble: we don't take half-baked code into the
> kernel until stuff has been discussed and analyzed properly. So instead
> of trying to downplay this, try answering my questions.
> 
> Here's another one: when does EREMOVE fail?
> 
> /me goes and looks it up
> 
> "The instruction fails if the operand is not properly aligned or does
> not refer to an EPC page or the page is in use by another thread, or
> other threads are running in the enclave to which the page belongs. In
> addition the instruction fails if the operand refers to an SECS with
> associations."
> 
> And I guess those conditions will become more in the future.
> 
> Now, let's play. I'm the cloud admin and you're cloud OS customer
> support. I say:
> 
> "I got this scary error message while running enclaves on my server
> 
> "EREMOVE returned ... .  EPC page leaked.  Reboot required to retrieve leaked pages."
> 
> but I cannot reboot that machine because there are guests running on it
> and I'm getting paid for those guests and I might get sued if I do?"
> 
> Your turn, go wild.

I suppose admin can migrate those VMs, and then engineers can analyse the root
cause of such failure, and then fix it.