[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <op.2nt1vls9wjvjmi@hhuan26-mobl.amr.corp.intel.com>
Date: Wed, 15 May 2024 16:55:59 -0500
From: "Haitao Huang" <haitao.huang@...ux.intel.com>
To: linux-kernel@...r.kernel.org, linux-sgx@...r.kernel.org,
jarkko@...nel.org, dave.hansen@...ux.intel.com, "Bojun Zhu"
<zhubojun.zbj@...group.com>
Cc: reinette.chatre@...el.com, 刘双(轩屹)
<ls123674@...group.com>
Subject: Re: [RFC PATCH v3 1/1] x86/sgx: Explicitly give up the CPU in EDMM's
ioctl() to avoid softlockup
On Wed, 15 May 2024 01:55:21 -0500, Bojun Zhu <zhubojun.zbj@...group.com>
wrote:
> EDMM's ioctl()s support batch operations, which may be
> time-consuming. Try to explicitly give up the CPU as the prefix
> operation at the every begin of "for loop" in
> sgx_enclave_{ modify_types | restrict_permissions | remove_pages}
> to give other tasks a chance to run, and avoid softlockup warning.
>
> Additionally perform pending signals check as the prefix operation,
> and introduce sgx_check_signal_and_resched(),
> which wraps all the checks.
>
> The following has been observed on Linux v6.9-rc5 with kernel
> preemptions disabled(by configuring "PREEMPT_NONE=y"), when kernel
> is requested to restrict page permissions of a large number of EPC pages.
>
> ------------[ cut here ]------------
> watchdog: BUG: soft lockup - CPU#45 stuck for 22s!
> ...
> RIP: 0010:sgx_enclave_restrict_permissions+0xba/0x1f0
> ...
> Call Trace:
> sgx_ioctl
> __x64_sys_ioctl
> x64_sys_call
> do_syscall_64
> entry_SYSCALL_64_after_hwframe
> ------------[ end trace ]------------
>
> Signed-off-by: Bojun Zhu <zhubojun.zbj@...group.com>
> ---
> arch/x86/kernel/cpu/sgx/ioctl.c | 40 +++++++++++++++++++++++----------
> 1 file changed, 28 insertions(+), 12 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c
> b/arch/x86/kernel/cpu/sgx/ioctl.c
> index b65ab214bdf5..6199f483143e 100644
> --- a/arch/x86/kernel/cpu/sgx/ioctl.c
> +++ b/arch/x86/kernel/cpu/sgx/ioctl.c
> @@ -365,6 +365,20 @@ static int sgx_validate_offset_length(struct
> sgx_encl *encl,
> return 0;
> }
> +/*
> + * Check signals and invoke scheduler. Return true for a pending signal.
> + */
> +static bool sgx_check_signal_and_resched(void)
> +{
> + if (signal_pending(current))
> + return true;
> +
> + if (need_resched())
> + cond_resched();
> +
> + return false;
> +}
> +
> /**
> * sgx_ioc_enclave_add_pages() - The handler for
> %SGX_IOC_ENCLAVE_ADD_PAGES
> * @encl: an enclave pointer
> @@ -409,7 +423,7 @@ static long sgx_ioc_enclave_add_pages(struct
> sgx_encl *encl, void __user *arg)
> struct sgx_enclave_add_pages add_arg;
> struct sgx_secinfo secinfo;
> unsigned long c;
> - int ret;
> + int ret = -ERESTARTSYS;
> if (!test_bit(SGX_ENCL_CREATED, &encl->flags) ||
> test_bit(SGX_ENCL_INITIALIZED, &encl->flags))
> @@ -432,15 +446,8 @@ static long sgx_ioc_enclave_add_pages(struct
> sgx_encl *encl, void __user *arg)
> return -EINVAL;
> for (c = 0 ; c < add_arg.length; c += PAGE_SIZE) {
> - if (signal_pending(current)) {
> - if (!c)
> - ret = -ERESTARTSYS;
> -
> + if (sgx_check_signal_and_resched())
> break;
> - }
ERESTARTSYS is only appropriate if we have not EADDed any pages yet.
If we got interrupted in the middle, we should return 0. User space would
check the 'count' returned and decide to recall this ioctl() with
'offset' reset to the next page, and adjust length.
Ditto for other changes below.
Thanks
Haitao
Powered by blists - more mailing lists