lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <9ffce724-23c9-4aa1-bc53-8292e1029991@redhat.com>
Date: Wed, 26 Feb 2025 16:30:20 +0100
From: David Hildenbrand <david@...hat.com>
To: Patrick Roy <roypat@...zon.co.uk>, rppt@...nel.org, seanjc@...gle.com
Cc: pbonzini@...hat.com, corbet@....net, willy@...radead.org,
 akpm@...ux-foundation.org, song@...nel.org, jolsa@...nel.org,
 ast@...nel.org, daniel@...earbox.net, andrii@...nel.org,
 martin.lau@...ux.dev, eddyz87@...il.com, yonghong.song@...ux.dev,
 john.fastabend@...il.com, kpsingh@...nel.org, sdf@...ichev.me,
 haoluo@...gle.com, Liam.Howlett@...cle.com, lorenzo.stoakes@...cle.com,
 vbabka@...e.cz, jannh@...gle.com, shuah@...nel.org, kvm@...r.kernel.org,
 linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
 linux-fsdevel@...r.kernel.org, linux-mm@...ck.org, bpf@...r.kernel.org,
 linux-kselftest@...r.kernel.org, tabba@...gle.com, jgowans@...zon.com,
 graf@...zon.com, kalyazin@...zon.com, xmarcalx@...zon.com,
 derekmn@...zon.com, jthoughton@...gle.com,
 Elliot Berman <quic_eberman@...cinc.com>
Subject: Re: [PATCH v4 03/12] KVM: guest_memfd: Add flag to remove from direct
 map

On 26.02.25 16:14, Patrick Roy wrote:
> 
> 
> On Wed, 2025-02-26 at 09:08 +0000, David Hildenbrand wrote:
>> On 26.02.25 09:48, Patrick Roy wrote:
>>>
>>>
>>> On Tue, 2025-02-25 at 16:54 +0000, David Hildenbrand wrote:> On 21.02.25 17:07, Patrick Roy wrote:
>>>>> Add KVM_GMEM_NO_DIRECT_MAP flag for KVM_CREATE_GUEST_MEMFD() ioctl. When
>>>>> set, guest_memfd folios will be removed from the direct map after
>>>>> preparation, with direct map entries only restored when the folios are
>>>>> freed.
>>>>>
>>>>> To ensure these folios do not end up in places where the kernel cannot
>>>>> deal with them, set AS_NO_DIRECT_MAP on the guest_memfd's struct
>>>>> address_space if KVM_GMEM_NO_DIRECT_MAP is requested.
>>>>>
>>>>> Note that this flag causes removal of direct map entries for all
>>>>> guest_memfd folios independent of whether they are "shared" or "private"
>>>>> (although current guest_memfd only supports either all folios in the
>>>>> "shared" state, or all folios in the "private" state if
>>>>> !IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM)). The usecase for removing
>>>>> direct map entries of also the shared parts of guest_memfd are a special
>>>>> type of non-CoCo VM where, host userspace is trusted to have access to
>>>>> all of guest memory, but where Spectre-style transient execution attacks
>>>>> through the host kernel's direct map should still be mitigated.
>>>>>
>>>>> Note that KVM retains access to guest memory via userspace
>>>>> mappings of guest_memfd, which are reflected back into KVM's memslots
>>>>> via userspace_addr. This is needed for things like MMIO emulation on
>>>>> x86_64 to work. Previous iterations attempted to instead have KVM
>>>>> temporarily restore direct map entries whenever such an access to guest
>>>>> memory was needed, but this turned out to have a significant performance
>>>>> impact, as well as additional complexity due to needing to refcount
>>>>> direct map reinsertion operations and making them play nicely with gmem
>>>>> truncations.
>>>>>
>>>>> This iteration also doesn't have KVM perform TLB flushes after direct
>>>>> map manipulations. This is because TLB flushes resulted in a up to 40x
>>>>> elongation of page faults in guest_memfd (scaling with the number of CPU
>>>>> cores), or a 5x elongation of memory population. On the one hand, TLB
>>>>> flushes are not needed for functional correctness (the virt->phys
>>>>> mapping technically stays "correct",  the kernel should simply to not it
>>>>> for a while), so this is a correct optimization to make. On the other
>>>>> hand, it means that the desired protection from Spectre-style attacks is
>>>>> not perfect, as an attacker could try to prevent a stale TLB entry from
>>>>> getting evicted, keeping it alive until the page it refers to is used by
>>>>> the guest for some sensitive data, and then targeting it using a
>>>>> spectre-gadget.
>>>>>
>>>>> Signed-off-by: Patrick Roy <roypat@...zon.co.uk>
>>>>
>>>> ...
>>>>
>>>>>
>>>>> +static bool kvm_gmem_test_no_direct_map(struct inode *inode)
>>>>> +{
>>>>> +     return ((unsigned long) inode->i_private) & KVM_GMEM_NO_DIRECT_MAP;
>>>>> +}
>>>>> +
>>>>>     static inline void kvm_gmem_mark_prepared(struct folio *folio)
>>>>>     {
>>>>> +     struct inode *inode = folio_inode(folio);
>>>>> +
>>>>> +     if (kvm_gmem_test_no_direct_map(inode)) {
>>>>> +             int r = set_direct_map_valid_noflush(folio_page(folio, 0), folio_nr_pages(folio),
>>>>> +                                                  false);
>>>>
>>>> Will this work if KVM is built as a module, or is this another good
>>>> reason why we might want guest_memfd core part of core-mm?
>>>
>>> mh, I'm admittedly not too familiar with the differences that would come
>>> from building KVM as a module vs not. I do remember something about the
>>> direct map accessors not being available for modules, so this would
>>> indeed not work. Does that mean moving gmem into core-mm will be a
>>> pre-requisite for the direct map removal stuff?
>>
>> Likely, we'd need some shim.
>>
>> Maybe for the time being it could be fenced using #if IS_BUILTIN() ...
>> but that sure won't win in a beauty contest.
> 
> Is anyone working on such a shim at the moment? Otherwise, would it make
> sense for me to look into it? (although I'll probably need a pointer or
> two for what is actually needed)
> 
> I saw your comment on Fuad's series [1] indicating that he'll also need
> some shim, so probably makes sense to tackle it anyway instead of
> hacking around it with #if-ery.

Elliot (CC) was working on "guestmem library" project [1], but it was 
unclear what we could factor out into the core.

Looks like a simple shim for such stuff might be a good starting point, 
although not the final idea of encapsulating more in the library.

@Elliot, are you currently still looking into this?


[1] 
https://lore.kernel.org/all/20241113-guestmem-library-v3-0-71fdee85676b@quicinc.com/T/#u

-- 
Cheers,

David / dhildenb


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ