[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7f38018b-dc89-4d79-a309-149557796121@amazon.co.uk>
Date: Wed, 26 Feb 2025 15:14:14 +0000
From: Patrick Roy <roypat@...zon.co.uk>
To: David Hildenbrand <david@...hat.com>, <rppt@...nel.org>,
<seanjc@...gle.com>
CC: <pbonzini@...hat.com>, <corbet@....net>, <willy@...radead.org>,
<akpm@...ux-foundation.org>, <song@...nel.org>, <jolsa@...nel.org>,
<ast@...nel.org>, <daniel@...earbox.net>, <andrii@...nel.org>,
<martin.lau@...ux.dev>, <eddyz87@...il.com>, <yonghong.song@...ux.dev>,
<john.fastabend@...il.com>, <kpsingh@...nel.org>, <sdf@...ichev.me>,
<haoluo@...gle.com>, <Liam.Howlett@...cle.com>, <lorenzo.stoakes@...cle.com>,
<vbabka@...e.cz>, <jannh@...gle.com>, <shuah@...nel.org>,
<kvm@...r.kernel.org>, <linux-doc@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, <linux-fsdevel@...r.kernel.org>,
<linux-mm@...ck.org>, <bpf@...r.kernel.org>,
<linux-kselftest@...r.kernel.org>, <tabba@...gle.com>, <jgowans@...zon.com>,
<graf@...zon.com>, <kalyazin@...zon.com>, <xmarcalx@...zon.com>,
<derekmn@...zon.com>, <jthoughton@...gle.com>
Subject: Re: [PATCH v4 03/12] KVM: guest_memfd: Add flag to remove from direct
map
On Wed, 2025-02-26 at 09:08 +0000, David Hildenbrand wrote:
> On 26.02.25 09:48, Patrick Roy wrote:
>>
>>
>> On Tue, 2025-02-25 at 16:54 +0000, David Hildenbrand wrote:> On 21.02.25 17:07, Patrick Roy wrote:
>>>> Add KVM_GMEM_NO_DIRECT_MAP flag for KVM_CREATE_GUEST_MEMFD() ioctl. When
>>>> set, guest_memfd folios will be removed from the direct map after
>>>> preparation, with direct map entries only restored when the folios are
>>>> freed.
>>>>
>>>> To ensure these folios do not end up in places where the kernel cannot
>>>> deal with them, set AS_NO_DIRECT_MAP on the guest_memfd's struct
>>>> address_space if KVM_GMEM_NO_DIRECT_MAP is requested.
>>>>
>>>> Note that this flag causes removal of direct map entries for all
>>>> guest_memfd folios independent of whether they are "shared" or "private"
>>>> (although current guest_memfd only supports either all folios in the
>>>> "shared" state, or all folios in the "private" state if
>>>> !IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM)). The usecase for removing
>>>> direct map entries of also the shared parts of guest_memfd are a special
>>>> type of non-CoCo VM where, host userspace is trusted to have access to
>>>> all of guest memory, but where Spectre-style transient execution attacks
>>>> through the host kernel's direct map should still be mitigated.
>>>>
>>>> Note that KVM retains access to guest memory via userspace
>>>> mappings of guest_memfd, which are reflected back into KVM's memslots
>>>> via userspace_addr. This is needed for things like MMIO emulation on
>>>> x86_64 to work. Previous iterations attempted to instead have KVM
>>>> temporarily restore direct map entries whenever such an access to guest
>>>> memory was needed, but this turned out to have a significant performance
>>>> impact, as well as additional complexity due to needing to refcount
>>>> direct map reinsertion operations and making them play nicely with gmem
>>>> truncations.
>>>>
>>>> This iteration also doesn't have KVM perform TLB flushes after direct
>>>> map manipulations. This is because TLB flushes resulted in a up to 40x
>>>> elongation of page faults in guest_memfd (scaling with the number of CPU
>>>> cores), or a 5x elongation of memory population. On the one hand, TLB
>>>> flushes are not needed for functional correctness (the virt->phys
>>>> mapping technically stays "correct", the kernel should simply to not it
>>>> for a while), so this is a correct optimization to make. On the other
>>>> hand, it means that the desired protection from Spectre-style attacks is
>>>> not perfect, as an attacker could try to prevent a stale TLB entry from
>>>> getting evicted, keeping it alive until the page it refers to is used by
>>>> the guest for some sensitive data, and then targeting it using a
>>>> spectre-gadget.
>>>>
>>>> Signed-off-by: Patrick Roy <roypat@...zon.co.uk>
>>>
>>> ...
>>>
>>>>
>>>> +static bool kvm_gmem_test_no_direct_map(struct inode *inode)
>>>> +{
>>>> + return ((unsigned long) inode->i_private) & KVM_GMEM_NO_DIRECT_MAP;
>>>> +}
>>>> +
>>>> static inline void kvm_gmem_mark_prepared(struct folio *folio)
>>>> {
>>>> + struct inode *inode = folio_inode(folio);
>>>> +
>>>> + if (kvm_gmem_test_no_direct_map(inode)) {
>>>> + int r = set_direct_map_valid_noflush(folio_page(folio, 0), folio_nr_pages(folio),
>>>> + false);
>>>
>>> Will this work if KVM is built as a module, or is this another good
>>> reason why we might want guest_memfd core part of core-mm?
>>
>> mh, I'm admittedly not too familiar with the differences that would come
>> from building KVM as a module vs not. I do remember something about the
>> direct map accessors not being available for modules, so this would
>> indeed not work. Does that mean moving gmem into core-mm will be a
>> pre-requisite for the direct map removal stuff?
>
> Likely, we'd need some shim.
>
> Maybe for the time being it could be fenced using #if IS_BUILTIN() ...
> but that sure won't win in a beauty contest.
Is anyone working on such a shim at the moment? Otherwise, would it make
sense for me to look into it? (although I'll probably need a pointer or
two for what is actually needed)
I saw your comment on Fuad's series [1] indicating that he'll also need
some shim, so probably makes sense to tackle it anyway instead of
hacking around it with #if-ery.
[1]: https://lore.kernel.org/kvm/8ddab670-8416-47f2-a5a6-94fb3444f328@redhat.com/
> --
> Cheers,
>
> David / dhildenb
>
Best,
Patrick
Powered by blists - more mailing lists