[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a3178c50-2e76-4743-8008-9a33bd0af93f@redhat.com>
Date: Tue, 25 Feb 2025 17:54:57 +0100
From: David Hildenbrand <david@...hat.com>
To: Patrick Roy <roypat@...zon.co.uk>, rppt@...nel.org, seanjc@...gle.com
Cc: pbonzini@...hat.com, corbet@....net, willy@...radead.org,
akpm@...ux-foundation.org, song@...nel.org, jolsa@...nel.org,
ast@...nel.org, daniel@...earbox.net, andrii@...nel.org,
martin.lau@...ux.dev, eddyz87@...il.com, yonghong.song@...ux.dev,
john.fastabend@...il.com, kpsingh@...nel.org, sdf@...ichev.me,
haoluo@...gle.com, Liam.Howlett@...cle.com, lorenzo.stoakes@...cle.com,
vbabka@...e.cz, jannh@...gle.com, shuah@...nel.org, kvm@...r.kernel.org,
linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-fsdevel@...r.kernel.org, linux-mm@...ck.org, bpf@...r.kernel.org,
linux-kselftest@...r.kernel.org, tabba@...gle.com, jgowans@...zon.com,
graf@...zon.com, kalyazin@...zon.com, xmarcalx@...zon.com,
derekmn@...zon.com, jthoughton@...gle.com
Subject: Re: [PATCH v4 03/12] KVM: guest_memfd: Add flag to remove from direct
map
On 21.02.25 17:07, Patrick Roy wrote:
> Add KVM_GMEM_NO_DIRECT_MAP flag for KVM_CREATE_GUEST_MEMFD() ioctl. When
> set, guest_memfd folios will be removed from the direct map after
> preparation, with direct map entries only restored when the folios are
> freed.
>
> To ensure these folios do not end up in places where the kernel cannot
> deal with them, set AS_NO_DIRECT_MAP on the guest_memfd's struct
> address_space if KVM_GMEM_NO_DIRECT_MAP is requested.
>
> Note that this flag causes removal of direct map entries for all
> guest_memfd folios independent of whether they are "shared" or "private"
> (although current guest_memfd only supports either all folios in the
> "shared" state, or all folios in the "private" state if
> !IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM)). The usecase for removing
> direct map entries of also the shared parts of guest_memfd are a special
> type of non-CoCo VM where, host userspace is trusted to have access to
> all of guest memory, but where Spectre-style transient execution attacks
> through the host kernel's direct map should still be mitigated.
>
> Note that KVM retains access to guest memory via userspace
> mappings of guest_memfd, which are reflected back into KVM's memslots
> via userspace_addr. This is needed for things like MMIO emulation on
> x86_64 to work. Previous iterations attempted to instead have KVM
> temporarily restore direct map entries whenever such an access to guest
> memory was needed, but this turned out to have a significant performance
> impact, as well as additional complexity due to needing to refcount
> direct map reinsertion operations and making them play nicely with gmem
> truncations.
>
> This iteration also doesn't have KVM perform TLB flushes after direct
> map manipulations. This is because TLB flushes resulted in a up to 40x
> elongation of page faults in guest_memfd (scaling with the number of CPU
> cores), or a 5x elongation of memory population. On the one hand, TLB
> flushes are not needed for functional correctness (the virt->phys
> mapping technically stays "correct", the kernel should simply to not it
> for a while), so this is a correct optimization to make. On the other
> hand, it means that the desired protection from Spectre-style attacks is
> not perfect, as an attacker could try to prevent a stale TLB entry from
> getting evicted, keeping it alive until the page it refers to is used by
> the guest for some sensitive data, and then targeting it using a
> spectre-gadget.
>
> Signed-off-by: Patrick Roy <roypat@...zon.co.uk>
...
>
> +static bool kvm_gmem_test_no_direct_map(struct inode *inode)
> +{
> + return ((unsigned long) inode->i_private) & KVM_GMEM_NO_DIRECT_MAP;
> +}
> +
> static inline void kvm_gmem_mark_prepared(struct folio *folio)
> {
> + struct inode *inode = folio_inode(folio);
> +
> + if (kvm_gmem_test_no_direct_map(inode)) {
> + int r = set_direct_map_valid_noflush(folio_page(folio, 0), folio_nr_pages(folio),
> + false);
Will this work if KVM is built as a module, or is this another good
reason why we might want guest_memfd core part of core-mm?
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists