[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <op.18o7z2biwjvjmi@hhuan26-mobl.amr.corp.intel.com>
Date: Wed, 26 Jul 2023 11:56:16 -0500
From: "Haitao Huang" <haitao.huang@...ux.intel.com>
To: "Hansen, Dave" <dave.hansen@...el.com>,
"linux-sgx@...r.kernel.org" <linux-sgx@...r.kernel.org>,
"x86@...nel.org" <x86@...nel.org>, "bp@...en8.de" <bp@...en8.de>,
"jarkko@...nel.org" <jarkko@...nel.org>,
"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
"mingo@...hat.com" <mingo@...hat.com>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"hpa@...or.com" <hpa@...or.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"Huang, Kai" <kai.huang@...el.com>
Cc: "kristen@...ux.intel.com" <kristen@...ux.intel.com>,
"Chatre, Reinette" <reinette.chatre@...el.com>,
"stable@...r.kernel.org" <stable@...r.kernel.org>,
"Christopherson,, Sean" <seanjc@...gle.com>
Subject: Re: [PATCH] x86/sgx: fix a NULL pointer
On Thu, 20 Jul 2023 19:52:22 -0500, Huang, Kai <kai.huang@...el.com> wrote:
> On Fri, 2023-07-21 at 00:32 +0000, Huang, Kai wrote:
>> On Wed, 2023-07-19 at 08:53 -0500, Haitao Huang wrote:
>> > Hi Dave and Kai
>> > On Tue, 18 Jul 2023 19:21:54 -0500, Dave Hansen
>> <dave.hansen@...el.com>
>> > wrote:
>> >
>> > > On 7/18/23 17:14, Huang, Kai wrote:
>> > > > Also perhaps the patch title is too vague. Adding more
>> information
>> > > > doesn't hurt
>> > > > I think, e.g., mentioning it is a fix for NULL pointer
>> dereference in
>> > > > the EAUG
>> > > > flow.
>> > >
>> > > Yeah, let's say something like:
>> > >
>> > > x86/sgx: Resolve SECS reclaim vs. page fault race
>> > >
>> > The patch is not to resolve SECS vs #PF race though the race is a
>> > necessary condition to cause the NULL pointer. The same condition
>> does not
>> > cause NULL pointer in the ELDU path of #PF, only in EAUG path of #PF.
>> >
>> > And the issue really is the NULL pointer not checked and fix was to
>> reuse
>> > the same code to reload SECS in ELDU code path for EAUG code path
>> >
>> >
>> > How about this:
>> >
>> > x86/sgx: Reload reclaimed SECS for EAUG on #PF
>> >
>> > or
>> >
>> > x86/sgx: Fix a NULL pointer to SECS used for EAUG on #PF
>> >
>>
>> Perhaps you can add "EAUG" part to what Dave suggested?
>>
>> x86/sgx: Resolves SECS reclaim vs. page fault race on EAUG
>>
>> (assuming Dave is fine with this :-))
Sure, I can use this too.
> Btw, do you have a real call trace? If you have, I think you can add
> that to
> the changelog too because that catches people's eye immediately.
Previously I was not able to reproduce without SGX cgroup patches. Now I
managed to get a trace with a QEMU setup with small EPC (8M), large RAM
(128G) and 128 vCPUs:
[ 1682.914263] BUG: kernel NULL pointer dereference, address:
0000000000000000
[ 1682.922966] #PF: supervisor read access in kernel mode
[ 1682.929115] #PF: error_code(0x0000) - not-present page
[ 1682.935264] PGD 0 P4D 0
[ 1682.938383] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 1682.943620] CPU: 43 PID: 2681 Comm: test_sgx Not tainted
6.3.0-rc4sgxcet #12
[ 1682.951989] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
[ 1682.965504] RIP: 0010:sgx_encl_eaug_page+0xc7/0x210
[ 1682.971359] Code: 25 49 8b 96 98 04 00 00 48 8d 40 48 48 89 42 08 48 89
56 48 49 8d 96 98 04 00 00 48 89 56 50 49 89 86 98 04 00 00 49 8b 46 60
<8b> 10 48 c1 e2 05 488
[ 1682.993330] RSP: 0000:ffffb2e64725bc00 EFLAGS: 00010246
[ 1682.999585] RAX: 0000000000000000 RBX: ffff987e5abac428 RCX:
0000000000000000
[ 1683.008059] RDX: 0000000000000001 RSI: 0000000000000000 RDI:
ffff987e61aee000
[ 1683.016533] RBP: ffffb2e64725bcf0 R08: 0000000000000000 R09:
ffffb2e64725bb58
[ 1683.025008] R10: 0000000000000000 R11: 00007f3f5c418fff R12:
ffff987e61aee020
[ 1683.033479] R13: ffff987e505bc080 R14: ffff987e61aee000 R15:
ffffb2e6420fcb20
[ 1683.041949] FS: 00007f3f5cb48740(0000) GS:ffff989cfe8c0000(0000)
knlGS:0000000000000000
[ 1683.051540] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1683.058478] CR2: 0000000000000000 CR3: 0000000115896002 CR4:
0000000000770ee0
[ 1683.067018] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 1683.075539] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ 1683.084085] PKRU: 55555554
[ 1683.087465] Call Trace:
[ 1683.090547] <TASK>
[ 1683.093220] ? __kmem_cache_alloc_node+0x16a/0x440
[ 1683.099034] ? xa_load+0x6e/0xa0
[ 1683.103038] sgx_vma_fault+0x119/0x230
[ 1683.107630] __do_fault+0x36/0x140
[ 1683.111828] do_fault+0x12f/0x400
[ 1683.115928] __handle_mm_fault+0x728/0x1110
[ 1683.121050] handle_mm_fault+0x105/0x310
[ 1683.125850] do_user_addr_fault+0x1ee/0x750
[ 1683.130957] ? __this_cpu_preempt_check+0x13/0x20
[ 1683.136667] exc_page_fault+0x76/0x180
[ 1683.141265] asm_exc_page_fault+0x27/0x30
[ 1683.146160] RIP: 0033:0x7ffc6496beea
[ 1683.150563] Code: 43 48 8b 4d 10 48 c7 c3 28 00 00 00 48 83 3c 19 00 75
31 48 83 c3 08 48 81 fb 00 01 00 00 75 ec 48 8b 19 48 8d 0d 00 00 00 00
<0f> 01 d7 48 8b 5d 101
[ 1683.172773] RSP: 002b:00007ffc64935b68 EFLAGS: 00000202
[ 1683.179138] RAX: 0000000000000003 RBX: 00007f3800000000 RCX:
00007ffc6496beea
[ 1683.187675] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
0000000000000000
[ 1683.196200] RBP: 00007ffc64935b70 R08: 0000000000000000 R09:
0000000000000000
[ 1683.204724] R10: 0000000000000000 R11: 0000000000000000 R12:
0000000000000000
[ 1683.213310] R13: 0000000000000000 R14: 0000000000000000 R15:
0000000000000000
[ 1683.221850] </TASK>
[ 1683.224636] Modules linked in: isofs intel_rapl_msr intel_rapl_common
binfmt_misc kvm_intel nls_iso8859_1 kvm ppdev irqbypass input_leds
parport_pc joydev parport rapi
[ 1683.291173] CR2: 0000000000000000
[ 1683.295271] ---[ end trace 0000000000000000 ]---
I'll add this to the commit as well.
Thanks
Haitao
Powered by blists - more mailing lists