lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4645fdb3-2836-0028-dee9-7a9321f1ebf2@molgen.mpg.de>
Date:   Fri, 26 Aug 2022 19:21:44 +0200
From:   Paul Menzel <pmenzel@...gen.mpg.de>
To:     Jarkko Sakkinen <jarkko@...nel.org>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        linux-sgx@...r.kernel.org
Cc:     Haitao Huang <haitao.huang@...ux.intel.com>,
        Reinette Chatre <reinette.chatre@...el.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        x86@...nel.org, "H. Peter Anvin" <hpa@...or.com>,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4] x86/sgx: Do not consider unsanitized pages an error

Dear Jarkko,


Am 26.08.22 um 14:51 schrieb Paul Menzel:

[…]

> Am 26.08.22 um 03:41 schrieb Jarkko Sakkinen:
>> In sgx_init(), if misc_register() for the provision device fails, and
>> neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be
>> prematurely stopped.
>>
>> This triggers WARN_ON() because sgx_dirty_page_list ends up being
>> non-empty. Ultimately this can crash the kernel, depending on the kernel
>> command line, which is not correct behavior because SGX driver is not
>> working incorrectly.
> 
> Maybe paste the WARN_ON trace, so `git log` can be searched for the 
> trace too.
> 
>> Print simple warning instead, and improve the output by printing the
>> number of unsanitized pages.
> 
> See below, but no warning seems to be logged in my case now. (I should 
> test Linus’ current master too.)

Just for the record, the problem still exists in Linus’ master branch:

```
[    0.000000] Linux version 6.0.0-rc2 (root@...b429beb4a) (gcc (Debian 
11.3.0-3) 11.3.0, GNU ld (GNU Binutils for Debian) 2.38) #382 SMP 
PREEMPT_DYNAMIC Fri Aug 26 12:52:15 UTC 2022
[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-6.0.0-rc2 
root=UUID=56f398e0-1e25-4fda-aa9f-611dece4b333 ro quiet 
module_blacklist=psmouse initcall_debug log_buf_len=4M cryptomgr.notests
[…]
[    0.268089] calling  sgx_init+0x0/0x409 @ 1
[    0.268103] sgx: EPC section 0x40200000-0x45f7ffff
[    0.268591] ------------[ cut here ]------------
[    0.268592] WARNING: CPU: 6 PID: 83 at 
arch/x86/kernel/cpu/sgx/main.c:401 ksgxd+0x1b7/0x1d0
[    0.268598] Modules linked in:
[    0.268600] CPU: 6 PID: 83 Comm: ksgxd Not tainted 6.0.0-rc2 #382
[    0.268603] Hardware name: Dell Inc. XPS 13 9370/0RMYH9, BIOS 1.21.0 
07/06/2022
[    0.268604] RIP: 0010:ksgxd+0x1b7/0x1d0
[    0.268607] Code: ff e9 f2 fe ff ff 48 89 df e8 75 07 0e 00 84 c0 0f 
84 c3 fe ff ff 31 ff e8 e6 07 0e 00 84 c0 0f 85 94 fe ff ff e9 af fe ff 
ff <0f> 0b e9 7f fe ff ff e8 dd 9c 95 00 66 66 2e 0f 1f 84 00 00 00 00
[    0.268608] RSP: 0000:ffffb6c7404f3ed8 EFLAGS: 00010287
[    0.268610] RAX: ffffb6c740431a10 RBX: ffff8dcd8117b400 RCX: 
0000000000000000
[    0.268612] RDX: 0000000080000000 RSI: ffffb6c7404319d0 RDI: 
00000000ffffffff
[    0.268613] RBP: ffff8dcd820a4d80 R08: ffff8dcd820a4180 R09: 
ffff8dcd820a4180
[    0.268614] R10: 0000000000000000 R11: 0000000000000006 R12: 
ffffb6c74006bce0
[    0.268615] R13: ffff8dcd80e63880 R14: ffffffffa8a60f10 R15: 
0000000000000000
[    0.268616] FS:  0000000000000000(0000) GS:ffff8dcf25580000(0000) 
knlGS:0000000000000000
[    0.268617] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.268619] CR2: 0000000000000000 CR3: 0000000213410001 CR4: 
00000000003706e0
[    0.268620] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
[    0.268621] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
0000000000000400
[    0.268622] Call Trace:
[    0.268624]  <TASK>
[    0.268627]  ? _raw_spin_lock_irqsave+0x24/0x60
[    0.268632]  ? _raw_spin_unlock_irqrestore+0x23/0x40
[    0.268634]  ? __kthread_parkme+0x36/0x90
[    0.268637]  kthread+0xe5/0x110
[    0.268639]  ? kthread_complete_and_exit+0x20/0x20
[    0.268642]  ret_from_fork+0x1f/0x30
[    0.268647]  </TASK>
[    0.268648] ---[ end trace 0000000000000000 ]---
[    0.268694] initcall sgx_init+0x0/0x409 returned -19 after 603 usecs
```

Tested-by: Paul Menzel <pmenzel@...gen.mpg.de>

>> Link: https://lore.kernel.org/linux-sgx/20220825051827.246698-1-jarkko@kernel.org/T/#u
>> Reported-by: Paul Menzel <pmenzel@...gen.mpg.de>
>> Fixes: 51ab30eb2ad4 ("x86/sgx: Replace section->init_laundry_list with sgx_dirty_page_list")
>> Signed-off-by: Jarkko Sakkinen <jarkko@...nel.org>
>> ---
>> Cc: Haitao Huang <haitao.huang@...ux.intel.com>
>> Cc: Dave Hansen <dave.hansen@...ux.intel.com>
>> Cc: Reinette Chatre <reinette.chatre@...el.com>
>>
>> v4:
>> - Explain expectations for dirty_page_list in the function header, 
>> instead
>>    of an inline comment.
>> - Improve commit message to explain the conditions better.
>> - Return the number of pages left dirty to ksgxd() and print warning 
>> after
>>    the 2nd call, if there are any.
>>
>> v3:
>> - Remove WARN_ON().
>> - Tuned comments and the commit message a bit.
>>
>> v2:
>> - Replaced WARN_ON() with optional pr_info() inside
>>    __sgx_sanitize_pages().
>> - Rewrote the commit message.
>> - Added the fixes tag.
>> ---
>>   arch/x86/kernel/cpu/sgx/main.c | 19 +++++++++++++------
>>   1 file changed, 13 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/x86/kernel/cpu/sgx/main.c 
>> b/arch/x86/kernel/cpu/sgx/main.c
>> index 515e2a5f25bb..903100fcfce3 100644
>> --- a/arch/x86/kernel/cpu/sgx/main.c
>> +++ b/arch/x86/kernel/cpu/sgx/main.c
>> @@ -49,17 +49,20 @@ static LIST_HEAD(sgx_dirty_page_list);
>>    * Reset post-kexec EPC pages to the uninitialized state. The pages are removed
>>    * from the input list, and made available for the page allocator. SECS pages
>>    * prepending their children in the input list are left intact.
>> + *
>> + * Contents of the @dirty_page_list must be thread-local, i.e.
>> + * not shared by multiple threads.
>>    */
>> -static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
>> +static int __sgx_sanitize_pages(struct list_head *dirty_page_list)
>>   {
>>       struct sgx_epc_page *page;
>> +    int left_dirty = 0;
>>       LIST_HEAD(dirty);
>>       int ret;
>> -    /* dirty_page_list is thread-local, no need for a lock: */
>>       while (!list_empty(dirty_page_list)) {
>>           if (kthread_should_stop())
>> -            return;
>> +            break;
>>           page = list_first_entry(dirty_page_list, struct 
>> sgx_epc_page, list);
>> @@ -92,12 +95,14 @@ static void __sgx_sanitize_pages(struct list_head 
>> *dirty_page_list)
>>           } else {
>>               /* The page is not yet clean - move to the dirty list. */
>>               list_move_tail(&page->list, &dirty);
>> +            left_dirty++;
>>           }
>>           cond_resched();
>>       }
>>       list_splice(&dirty, dirty_page_list);
>> +    return left_dirty;
>>   }
>>   static bool sgx_reclaimer_age(struct sgx_epc_page *epc_page)
>> @@ -388,6 +393,8 @@ void sgx_reclaim_direct(void)
>>   static int ksgxd(void *p)
>>   {
>> +    int left_dirty;
>> +
>>       set_freezable();
>>       /*
>> @@ -395,10 +402,10 @@ static int ksgxd(void *p)
>>        * required for SECS pages, whose child pages blocked EREMOVE.
>>        */
>>       __sgx_sanitize_pages(&sgx_dirty_page_list);
>> -    __sgx_sanitize_pages(&sgx_dirty_page_list);
>> -    /* sanity check: */
>> -    WARN_ON(!list_empty(&sgx_dirty_page_list));
>> +    left_dirty = __sgx_sanitize_pages(&sgx_dirty_page_list);
>> +    if (left_dirty)
>> +        pr_warn("%d unsanitized pages\n", left_dirty);
>>       while (!kthread_should_stop()) {
>>           if (try_to_freeze())
> 
> I tested this on top of commit 4c612826bec1 (Merge tag 'net-6.0-rc3' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net) and the 
> warning trace is gone.
> 
>      [    0.255192] calling  sgx_init+0x0/0x409 @ 1
>      [    0.255207] sgx: EPC section 0x40200000-0x45f7ffff
>      [    0.255747] initcall sgx_init+0x0/0x409 returned -19 after 552 usecs
> 
> (OT: If -19 suggests something failed, a message, why sgx_init() failed 
> would be nice.)
> 
> Please find the whole output of `dmesg` attached.
> 
> 
> Kind regards,
> 
> Paul

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ