[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <op.2nt3fzsnwjvjmi@hhuan26-mobl.amr.corp.intel.com>
Date: Wed, 15 May 2024 17:29:49 -0500
From: "Haitao Huang" <haitao.huang@...ux.intel.com>
To: linux-kernel@...r.kernel.org, linux-sgx@...r.kernel.org,
jarkko@...nel.org, dave.hansen@...ux.intel.com, "Bojun Zhu"
<zhubojun.zbj@...group.com>
Cc: reinette.chatre@...el.com, 刘双(轩屹)
<ls123674@...group.com>
Subject: Re: [RFC PATCH v3 1/1] x86/sgx: Explicitly give up the CPU in EDMM's
ioctl() to avoid softlockup
On Wed, 15 May 2024 16:55:59 -0500, Haitao Huang
<haitao.huang@...ux.intel.com> wrote:
> On Wed, 15 May 2024 01:55:21 -0500, Bojun Zhu
> <zhubojun.zbj@...group.com> wrote:
>
>> EDMM's ioctl()s support batch operations, which may be
>> time-consuming. Try to explicitly give up the CPU as the prefix
>> operation at the every begin of "for loop" in
>> sgx_enclave_{ modify_types | restrict_permissions | remove_pages}
>> to give other tasks a chance to run, and avoid softlockup warning.
>>
>> Additionally perform pending signals check as the prefix operation,
>> and introduce sgx_check_signal_and_resched(),
>> which wraps all the checks.
>>
>> The following has been observed on Linux v6.9-rc5 with kernel
>> preemptions disabled(by configuring "PREEMPT_NONE=y"), when kernel
>> is requested to restrict page permissions of a large number of EPC
>> pages.
>>
>> ------------[ cut here ]------------
>> watchdog: BUG: soft lockup - CPU#45 stuck for 22s!
>> ...
>> RIP: 0010:sgx_enclave_restrict_permissions+0xba/0x1f0
>> ...
>> Call Trace:
>> sgx_ioctl
>> __x64_sys_ioctl
>> x64_sys_call
>> do_syscall_64
>> entry_SYSCALL_64_after_hwframe
>> ------------[ end trace ]------------
>>
>> Signed-off-by: Bojun Zhu <zhubojun.zbj@...group.com>
>> ---
>> arch/x86/kernel/cpu/sgx/ioctl.c | 40 +++++++++++++++++++++++----------
>> 1 file changed, 28 insertions(+), 12 deletions(-)
>>
>> diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c
>> b/arch/x86/kernel/cpu/sgx/ioctl.c
>> index b65ab214bdf5..6199f483143e 100644
>> --- a/arch/x86/kernel/cpu/sgx/ioctl.c
>> +++ b/arch/x86/kernel/cpu/sgx/ioctl.c
>> @@ -365,6 +365,20 @@ static int sgx_validate_offset_length(struct
>> sgx_encl *encl,
>> return 0;
>> }
>> +/*
>> + * Check signals and invoke scheduler. Return true for a pending
>> signal.
>> + */
>> +static bool sgx_check_signal_and_resched(void)
>> +{
>> + if (signal_pending(current))
>> + return true;
>> +
>> + if (need_resched())
>> + cond_resched();
>> +
>> + return false;
>> +}
>> +
>> /**
>> * sgx_ioc_enclave_add_pages() - The handler for
>> %SGX_IOC_ENCLAVE_ADD_PAGES
>> * @encl: an enclave pointer
>> @@ -409,7 +423,7 @@ static long sgx_ioc_enclave_add_pages(struct
>> sgx_encl *encl, void __user *arg)
>> struct sgx_enclave_add_pages add_arg;
>> struct sgx_secinfo secinfo;
>> unsigned long c;
>> - int ret;
>> + int ret = -ERESTARTSYS;
>> if (!test_bit(SGX_ENCL_CREATED, &encl->flags) ||
>> test_bit(SGX_ENCL_INITIALIZED, &encl->flags))
>> @@ -432,15 +446,8 @@ static long sgx_ioc_enclave_add_pages(struct
>> sgx_encl *encl, void __user *arg)
>> return -EINVAL;
>> for (c = 0 ; c < add_arg.length; c += PAGE_SIZE) {
>> - if (signal_pending(current)) {
>> - if (!c)
>> - ret = -ERESTARTSYS;
>> -
>> + if (sgx_check_signal_and_resched())
>> break;
>> - }
>
> ERESTARTSYS is only appropriate if we have not EADDed any pages yet.
> If we got interrupted in the middle, we should return 0. User space
> would check the 'count' returned and decide to recall this ioctl() with
> 'offset' reset to the next page, and adjust length.
NVM, I misread it. ret will be changed to zero in subsequent iteration.
Reviewed-by: Haitao Huang <haitao.huang@...ux.intel.com>
Thanks
Haitao
Powered by blists - more mailing lists