linux-kernel - Re: [RFC] KVM: mm: fd-based approach for supporting KVM guest private memory

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <51a6f74f-6c05-74b9-3fd7-b7cd900fb8cc@redhat.com>
Date:   Wed, 15 Sep 2021 15:51:25 +0200
From:   David Hildenbrand <david@...hat.com>
To:     Chao Peng <chao.p.peng@...ux.intel.com>,
        "Kirill A. Shutemov" <kirill@...temov.name>
Cc:     Andy Lutomirski <luto@...nel.org>,
        Sean Christopherson <seanjc@...gle.com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Joerg Roedel <joro@...tes.org>, kvm@...r.kernel.org,
        linux-kernel@...r.kernel.org, Borislav Petkov <bp@...en8.de>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Joerg Roedel <jroedel@...e.de>,
        Andi Kleen <ak@...ux.intel.com>,
        David Rientjes <rientjes@...gle.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Tom Lendacky <thomas.lendacky@....com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Varad Gautam <varad.gautam@...e.com>,
        Dario Faggioli <dfaggioli@...e.com>, x86@...nel.org,
        linux-mm@...ck.org, linux-coco@...ts.linux.dev,
        "Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
        Kuppuswamy Sathyanarayanan 
        <sathyanarayanan.kuppuswamy@...ux.intel.com>,
        Dave Hansen <dave.hansen@...el.com>,
        Yu Zhang <yu.c.zhang@...ux.intel.com>
Subject: Re: [RFC] KVM: mm: fd-based approach for supporting KVM guest private
 memory

>> diff --git a/mm/memfd.c b/mm/memfd.c
>> index 081dd33e6a61..ae43454789f4 100644
>> --- a/mm/memfd.c
>> +++ b/mm/memfd.c
>> @@ -130,11 +130,24 @@ static unsigned int *memfd_file_seals_ptr(struct file *file)
>>   	return NULL;
>>   }
>>   
>> +int memfd_register_guest(struct inode *inode, void *owner,
>> +			 const struct guest_ops *guest_ops,
>> +			 const struct guest_mem_ops **guest_mem_ops)
>> +{
>> +	if (shmem_mapping(inode->i_mapping)) {
>> +		return shmem_register_guest(inode, owner,
>> +					    guest_ops, guest_mem_ops);
>> +	}
>> +
>> +	return -EINVAL;
>> +}
> 
> Are we stick our design to memfd interface (e.g other memory backing
> stores like tmpfs and hugetlbfs will all rely on this memfd interface to
> interact with KVM), or this is just the initial implementation for PoC?

I don't think we are, it still feels like we are in the early prototype 
phase (even way before a PoC). I'd be happy to see something "cleaner" 
so to say -- it still feels kind of hacky to me, especially there seem 
to be many pieces of the big puzzle missing so far. Unfortunately, this 
series hasn't caught the attention of many -MM people so far, maybe 
because other people miss the big picture as well and are waiting for a 
complete design proposal.

For example, what's unclear to me: we'll be allocating pages with 
GFP_HIGHUSER_MOVABLE, making them land on MIGRATE_CMA or ZONE_MOVABLE; 
then we silently turn them unmovable, which breaks these concepts. Who'd 
migrate these pages away just like when doing long-term pinning, or how 
is that supposed to work?

Also unclear to me is how refcount and mapcount will be handled to 
prevent swapping, who will actually do some kind of gfn-epfn etc. 
mapping, how we'll forbid access to this memory e.g., via /proc/kcore or 
when dumping memory ... and how it would ever work with 
migration/swapping/rmap (it's clearly future work, but it's been raised 
that this would be the way to make it work, I don't quite see how it 
would all come together).

<note>
Last but not least, I raised to Intel via a different channel that I'd 
appreciate updated hardware that avoids essentially crashing the 
hypervisor when writing to encrypted memory from user space. It has the 
smell of "broken hardware" to it that might just be fixed by a new 
hardware generation to make it look more similar to other successful 
implementations of secure/encrypted memory. That might it much easier to 
support an initial version of TDX -- instead of having to reinvent the 
way we map guest memory just now to support hardware that might sort out 
the root problem later.

Having that said, there might be benefits to mapping guest memory 
differently, but my gut feeling is that it might take quite a long time 
to get something reasonable working, to settle on a design, and to get 
it accepted by all involved parties to merge it upstream.

Just my 2 cents, I might be all wrong as so often.
<\note>

-- 
Thanks,

David / dhildenb