linux-kernel - Re: [PATCH] x86/sgx: Silence softlockup detection when releasing large enclaves

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YfFauJSPU5TNetSe@iki.fi>
Date:   Wed, 26 Jan 2022 16:29:12 +0200
From:   Jarkko Sakkinen <jarkko@...nel.org>
To:     Reinette Chatre <reinette.chatre@...el.com>
Cc:     dave.hansen@...ux.intel.com, tglx@...utronix.de, bp@...en8.de,
        luto@...nel.org, mingo@...hat.com, linux-sgx@...r.kernel.org,
        x86@...nel.org, linux-kernel@...r.kernel.org,
        stable@...r.kernel.org
Subject: Re: [PATCH] x86/sgx: Silence softlockup detection when releasing
 large enclaves

On Thu, Jan 20, 2022 at 08:28:36AM -0800, Reinette Chatre wrote:
> Hi Jarkko,
> 
> On 1/20/2022 5:01 AM, Jarkko Sakkinen wrote:
> > On Tue, 2022-01-18 at 11:14 -0800, Reinette Chatre wrote:
> >> Vijay reported that the "unclobbered_vdso_oversubscribed" selftest
> >> triggers the softlockup detector.
> >>
> >> Actual SGX systems have 128GB of enclave memory or more.  The
> >> "unclobbered_vdso_oversubscribed" selftest creates one enclave which
> >> consumes all of the enclave memory on the system. Tearing down such a
> >> large enclave takes around a minute, most of it in the loop where
> >> the EREMOVE instruction is applied to each individual 4k enclave
> >> page.
> >>
> >> Spending one minute in a loop triggers the softlockup detector.
> >>
> >> Add a cond_resched() to give other tasks a chance to run and placate
> >> the softlockup detector.
> >>
> >> Cc: stable@...r.kernel.org
> >> Fixes: 1728ab54b4be ("x86/sgx: Add a page reclaimer")
> >> Reported-by: Vijay Dhanraj <vijay.dhanraj@...el.com>
> >> Acked-by: Dave Hansen <dave.hansen@...ux.intel.com>
> >> Signed-off-by: Reinette Chatre <reinette.chatre@...el.com>
> >> ---
> >> Softlockup message:
> >> watchdog: BUG: soft lockup - CPU#7 stuck for 22s! [test_sgx:11502]
> >> Kernel panic - not syncing: softlockup: hung tasks
> >> <snip>
> >> sgx_encl_release+0x86/0x1c0
> >> sgx_release+0x11c/0x130
> >> __fput+0xb0/0x280
> >> ____fput+0xe/0x10
> >> task_work_run+0x6c/0xc0
> >> exit_to_user_mode_prepare+0x1eb/0x1f0
> >> syscall_exit_to_user_mode+0x1d/0x50
> >> do_syscall_64+0x46/0xb0
> >> entry_SYSCALL_64_after_hwframe+0x44/0xae
> >>
> >>  arch/x86/kernel/cpu/sgx/encl.c | 1 +
> >>  1 file changed, 1 insertion(+)
> >>
> >> diff --git a/arch/x86/kernel/cpu/sgx/encl.c
> >> b/arch/x86/kernel/cpu/sgx/encl.c
> >> index 001808e3901c..ab2b79327a8a 100644
> >> --- a/arch/x86/kernel/cpu/sgx/encl.c
> >> +++ b/arch/x86/kernel/cpu/sgx/encl.c
> >> @@ -410,6 +410,7 @@ void sgx_encl_release(struct kref *ref)
> >>                 }
> >>  
> >>                 kfree(entry);
> >> +               cond_resched();
> >>         }
> >>  
> >>         xa_destroy(&encl->page_array);
> > 
> > I'd add a comment, e.g.
> > 
> > /* Invoke scheduler to prevent soft lockups. */
> 
> I could do that. I would like to point out though that there are already
> six other usages of cond_resched() in the driver and it does indeed
> seem to be the common pattern. When adding this comment to the now
> seventh usage it would be the first comment documenting the usage of
> cond_resched() in the driver.
> 
> > 
> > Other than that makes sense.
> 
> Thank you very much for taking a look.

Well, I believe in inline comments to evolution. As in here it was missing,
a reminder makes sense.

/Jarkko