linux-kernel - Re: [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YFocgJ+OD/0rCsvg@google.com>
Date:   Tue, 23 Mar 2021 16:51:12 +0000
From:   Sean Christopherson <seanjc@...gle.com>
To:     Borislav Petkov <bp@...en8.de>
Cc:     Kai Huang <kai.huang@...el.com>, kvm@...r.kernel.org,
        x86@...nel.org, linux-sgx@...r.kernel.org,
        linux-kernel@...r.kernel.org, jarkko@...nel.org, luto@...nel.org,
        dave.hansen@...el.com, rick.p.edgecombe@...el.com,
        haitao.huang@...el.com, pbonzini@...hat.com, tglx@...utronix.de,
        mingo@...hat.com, hpa@...or.com
Subject: Re: [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from
 sgx_free_epc_page()

On Tue, Mar 23, 2021, Borislav Petkov wrote:
> On Tue, Mar 23, 2021 at 04:21:47PM +0000, Sean Christopherson wrote:
> > I like the idea of pointing at the documentation.  The documentation should
> > probably emphasize that something is very, very wrong.
> 
> Yap, because no matter how we formulate the error message, it still ain't enough
> and needs a longer explanation.
> 
> > E.g. if a kernel bug triggers EREMOVE failure and isn't detected until
> > the kernel is widely deployed in a fleet, then the folks deploying the
> > kernel probably _should_ be in all out panic. For this variety of bug
> > to escape that far, it means there are huge holes in test coverage, in
> > both the kernel itself and in the infrasturcture of whoever is rolling
> > out their new kernel.
> 
> You sound just like someone who works at a company with a big fleet, oh
> wait...
> 
> :-)
> 
> And yap, you big fleeted guys will more likely catch it but we do have
> all these other customers who have a handful of servers only so they
> probably won't be able to do such a wide coverage.

The size of the fleet shouldn't matter for this specific case.  This bug
requires the _host_ to be running enclaves, and obviously it also requires the
system to be running SGX-enabled guests as well.  In such a setup, the SGX
workload running in the host should be very well defined and understood, i.e.
testing should be a well-bounded problem to solve.

Running enclaves in both the host and guest should be uncommon in and of itself,
and for such setups, running _any_ SGX workloads in the host, let alone more
than 1 or 2 unique workloads, without ensuring guests are fully isolated is,
IMO, insane.

But yeah, what can happen, will happen.
 
> So I hope they'll appreciate this longer explanation about what to do
> when they hit it. And normally I wouldn't even care but we almost never
> tell people to reboot their boxes to fix sh*t - that's the other OS.
> 
> Thx.
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette