[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <48953bf2-cee9-f818-dc50-5fb5b9b410bf@oracle.com>
Date: Tue, 24 Jan 2023 16:08:46 +0000
From: Liam Merwick <liam.merwick@...cle.com>
To: Sean Christopherson <seanjc@...gle.com>,
Chao Peng <chao.p.peng@...ux.intel.com>
Cc: kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-mm@...ck.org, linux-fsdevel@...r.kernel.org,
linux-arch@...r.kernel.org, linux-api@...r.kernel.org,
linux-doc@...r.kernel.org, qemu-devel@...gnu.org,
Paolo Bonzini <pbonzini@...hat.com>,
Jonathan Corbet <corbet@....net>,
Vitaly Kuznetsov <vkuznets@...hat.com>,
Wanpeng Li <wanpengli@...cent.com>,
Jim Mattson <jmattson@...gle.com>,
Joerg Roedel <joro@...tes.org>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
Arnd Bergmann <arnd@...db.de>,
Naoya Horiguchi <naoya.horiguchi@....com>,
Miaohe Lin <linmiaohe@...wei.com>, x86@...nel.org,
"H . Peter Anvin" <hpa@...or.com>, Hugh Dickins <hughd@...gle.com>,
Jeff Layton <jlayton@...nel.org>,
"J . Bruce Fields" <bfields@...ldses.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Shuah Khan <shuah@...nel.org>, Mike Rapoport <rppt@...nel.org>,
Steven Price <steven.price@....com>,
"Maciej S . Szmigiero" <mail@...iej.szmigiero.name>,
Vlastimil Babka <vbabka@...e.cz>,
Vishal Annapurve <vannapurve@...gle.com>,
Yu Zhang <yu.c.zhang@...ux.intel.com>,
"Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
luto@...nel.org, jun.nakajima@...el.com, dave.hansen@...el.com,
ak@...ux.intel.com, david@...hat.com, aarcange@...hat.com,
ddutile@...hat.com, dhildenb@...hat.com,
Quentin Perret <qperret@...gle.com>, tabba@...gle.com,
Michael Roth <michael.roth@....com>, mhocko@...e.com,
wei.w.wang@...el.com, Liam Merwick <liam.merwick@...cle.com>
Subject: Re: [PATCH v10 0/9] KVM: mm: fd-based approach for supporting KVM
On 14/01/2023 00:37, Sean Christopherson wrote:
> On Fri, Dec 02, 2022, Chao Peng wrote:
>> This patch series implements KVM guest private memory for confidential
>> computing scenarios like Intel TDX[1]. If a TDX host accesses
>> TDX-protected guest memory, machine check can happen which can further
>> crash the running host system, this is terrible for multi-tenant
>> configurations. The host accesses include those from KVM userspace like
>> QEMU. This series addresses KVM userspace induced crash by introducing
>> new mm and KVM interfaces so KVM userspace can still manage guest memory
>> via a fd-based approach, but it can never access the guest memory
>> content.
>>
>> The patch series touches both core mm and KVM code. I appreciate
>> Andrew/Hugh and Paolo/Sean can review and pick these patches. Any other
>> reviews are always welcome.
>> - 01: mm change, target for mm tree
>> - 02-09: KVM change, target for KVM tree
>
> A version with all of my feedback, plus reworked versions of Vishal's selftest,
> is available here:
>
> git@...hub.com:sean-jc/linux.git x86/upm_base_support
>
> It compiles and passes the selftest, but it's otherwise barely tested. There are
> a few todos (2 I think?) and many of the commits need changelogs, i.e. it's still
> a WIP.
>
When running LTP (https://github.com/linux-test-project/ltp) on the v10
bits (and also with Sean's branch above) I encounter the following NULL
pointer dereference with testcases/kernel/syscalls/madvise/madvise01
(100% reproducible).
It appears that in restrictedmem_error_page()
inode->i_mapping->private_data is NULL
in the list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list)
but I don't know why.
[ 5365.177168] BUG: kernel NULL pointer dereference, address:
0000000000000028
[ 5365.178881] #PF: supervisor read access in kernel mode
[ 5365.180006] #PF: error_code(0x0000) - not-present page
[ 5365.181322] PGD 8000000109dad067 P4D 8000000109dad067 PUD 107707067 PMD 0
[ 5365.183474] Oops: 0000 [#1] PREEMPT SMP PTI
[ 5365.184792] CPU: 0 PID: 22086 Comm: madvise01 Not tainted
6.1.0-1.el8.seanjcupm.x86_64 #1
[ 5365.186572] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.5.1 06/16/2021
[ 5365.188816] RIP: 0010:restrictedmem_error_page+0xc7/0x1b0
[ 5365.190081] Code: 99 00 48 8b 55 00 48 8b 02 48 8d 8a e8 fe ff ff 48
2d 18 01 00 00 48 39 d5 0f 84 8a 00 00 00 48 8b 51 30 48 8b 92 b8 00 00
00 <48> 8b 4a 28 4c 39 b1 d8 00 00 00 74 22 48 8b 88 18 01 00 00 48 8d
[ 5365.193984] RSP: 0018:ffff9b7343c07d80 EFLAGS: 00010206
[ 5365.195142] RAX: ffff8e5b410cfc70 RBX: 0000000000000001 RCX:
ffff8e5b4048e580
[ 5365.196888] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
0000000000000000
[ 5365.198399] RBP: ffff8e5b410cfd88 R08: 0000000000000000 R09:
0000000000000000
[ 5365.200200] R10: 0000000000000000 R11: 0000000000000000 R12:
0000000000000000
[ 5365.201843] R13: ffff8e5b410cfd80 R14: ffff8e5b47cc7618 R15:
ffffd49d44c05080
[ 5365.203472] FS: 00007fc96de9b5c0(0000) GS:ffff8e5deda00000(0000)
knlGS:0000000000000000
[ 5365.205485] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5365.206791] CR2: 0000000000000028 CR3: 000000012047e002 CR4:
0000000000170ef0
[ 5365.208131] Call Trace:
[ 5365.208752] <TASK>
[ 5365.209229] me_pagecache_clean+0x58/0x100
[ 5365.210196] identify_page_state+0x84/0xd0
[ 5365.211180] memory_failure+0x231/0x8b0
[ 5365.212148] madvise_inject_error.cold+0x8d/0xa4
[ 5365.213317] do_madvise+0x363/0x3a0
[ 5365.214177] __x64_sys_madvise+0x2c/0x40
[ 5365.215159] do_syscall_64+0x3f/0xa0
[ 5365.216016] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 5365.217130] RIP: 0033:0x7fc96d8399ab
[ 5365.217953] Code: 73 01 c3 48 8b 0d dd 54 38 00 f7 d8 64 89 01 48 83
c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 1c 00 00 00 0f
05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ad 54 38 00 f7 d8 64 89 01 48
[ 5365.222323] RSP: 002b:00007fff62a99b18 EFLAGS: 00000206 ORIG_RAX:
000000000000001c
[ 5365.224026] RAX: ffffffffffffffda RBX: 000000000041ce00 RCX:
00007fc96d8399ab
[ 5365.225375] RDX: 0000000000000064 RSI: 000000000000a000 RDI:
00007fc96de8e000
[ 5365.226999] RBP: 00007fc96de9b540 R08: 0000000000000001 R09:
0000000000415c80
[ 5365.228641] R10: 0000000000000001 R11: 0000000000000206 R12:
0000000000000008
[ 5365.230074] R13: 0000000000000000 R14: 0000000000000000 R15:
0000000000000000
Regards,
Liam
> As for next steps, can you (handwaving all of the TDX folks) take a look at what
> I pushed and see if there's anything horrifically broken, and that it still works
> for TDX?
>
> Fuad (and pKVM folks) same ask for you with respect to pKVM. Absolutely no rush
> (and I mean that).
>
> On my side, the two things on my mind are (a) tests and (b) downstream dependencies
> (SEV and TDX). For tests, I want to build a lists of tests that are required for
> merging so that the criteria for merging are clear, and so that if the list is large
> (haven't thought much yet), the work of writing and running tests can be distributed.
>
> Regarding downstream dependencies, before this lands, I want to pull in all the
> TDX and SNP series and see how everything fits together. Specifically, I want to
> make sure that we don't end up with a uAPI that necessitates ugly code, and that we
> don't miss an opportunity to make things simpler. The patches in the SNP series to
> add "legacy" SEV support for UPM in particular made me slightly rethink some minor
> details. Nothing remotely major, but something that needs attention since it'll
> be uAPI.
>
> I'm off Monday, so it'll be at least Tuesday before I make any more progress on
> my side.
>
> Thanks!
>
Powered by blists - more mailing lists