linux-kernel - Re: [PATCH v2 1/2] x86/sgx: Resolve EAUG race where losing thread returns SIGBUS

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <D1AAFDCSFYH1.11RF8JUS2NEZS@kernel.org>
Date: Wed, 15 May 2024 17:28:43 +0300
From: "Jarkko Sakkinen" <jarkko@...nel.org>
To: "Dave Hansen" <dave.hansen@...el.com>, "Dmitrii Kuvaiskii"
 <dmitrii.kuvaiskii@...el.com>, <dave.hansen@...ux.intel.com>,
 <kai.huang@...el.com>, <haitao.huang@...ux.intel.com>,
 <reinette.chatre@...el.com>, <linux-sgx@...r.kernel.org>,
 <linux-kernel@...r.kernel.org>
Cc: <mona.vij@...el.com>, <kailun.qin@...el.com>, <stable@...r.kernel.org>,
 Marcelina Kościelnicka <mwk@...isiblethingslab.com>
Subject: Re: [PATCH v2 1/2] x86/sgx: Resolve EAUG race where losing thread
 returns SIGBUS

On Wed May 15, 2024 at 5:15 PM EEST, Dave Hansen wrote:
> On 5/15/24 06:54, Jarkko Sakkinen wrote:
> > I'd cut out 90% of the description out and just make the argument of
> > the wrong error code, and done. The sequence is great for showing
> > how this could happen. The prose makes my head hurt tbh.
>
> The changelog is too long, but not fatally so.  I'd much rather have a
> super verbose description than something super sparse.
>
> Would something like this make more sense to folks?
>
> 	Imagine an mmap()'d file. Two threads touch the same address at
> 	the same time and fault. Both allocate a physical page and race
> 	to install a PTE for that page. Only one will win the race. The
> 	loser frees its page, but still continues handling the fault as
> 	a success and returns VM_FAULT_NOPAGE from the fault handler.
>
> 	The same race can happen with SGX. But there's a bug: the loser
> 	in the SGX steers into a failure path. The loser EREMOVE's the
> 	winner's EPC page, then returns SIGBUS, likely killing the app.
>
> 	Fix the SGX loser's behavior. Change the return code to
> 	VM_FAULT_NOPAGE to avoid SIGBUS and call sgx_free_epc_page()
> 	which avoids EREMOVE'ing the winner's page and only frees the
> 	page that the loser allocated.

Yes!

I did read the whole thing. My comment was only related to the
chain of maintainers who also have to deal with this patch
eventually.

BR, Jarkko