[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <3b56b4a0-ac52-4e1e-9f1b-7379af307292@amazon.com>
Date: Mon, 15 Sep 2025 12:01:04 +0100
From: Nikita Kalyazin <kalyazin@...zon.com>
To: Vishal Annapurve <vannapurve@...gle.com>, David Hildenbrand
<david@...hat.com>
CC: James Houghton <jthoughton@...gle.com>, "Kalyazin, Nikita"
<kalyazin@...zon.co.uk>, "pbonzini@...hat.com" <pbonzini@...hat.com>,
"shuah@...nel.org" <shuah@...nel.org>, "kvm@...r.kernel.org"
<kvm@...r.kernel.org>, "linux-kselftest@...r.kernel.org"
<linux-kselftest@...r.kernel.org>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, "michael.day@....com" <michael.day@....com>,
"Roy, Patrick" <roypat@...zon.co.uk>, "Thomson, Jack" <jackabt@...zon.co.uk>,
"Manwaring, Derek" <derekmn@...zon.com>, "Cali, Marco"
<xmarcalx@...zon.co.uk>
Subject: Re: [PATCH v5 1/2] KVM: guest_memfd: add generic population via write
On 13/09/2025 01:18, Vishal Annapurve wrote:
> On Fri, Sep 12, 2025 at 8:39 AM David Hildenbrand <david@...hat.com> wrote:
>>
>>>>>> What's meant to happen if we do use this for CoCo VMs? I would expect
>>>>>> write() to fail, but I don't see why it would (seems like we need/want
>>>>>> a check that we aren't write()ing to private memory).
>>>>>
>>>>> I am not so sure that write() should fail even in CoCo VMs if we access
>>>>> not-yet-prepared pages. My understanding was that the CoCoisation of
>>>>> the memory occurs during "preparation". But I may be wrong here.
>>>>
>>>> But how do you handle that a page is actually inaccessible and should
>>>> not be touched?
>>>>
>>>> IOW, with CXL you could crash the host.
>>>>
>>>> There is likely some state check missing, or it should be restricted to
>>>> VM types.
>>>
>>> Sorry, I'm missing the link between VM types and CXL. How are they related?
>>
>> I think what you explain below clarifies it.
>>
>>>
>>> My thinking was it is a regular (accessible) page until it is "prepared"
>>> by the CoCo hardware, which is currently tracked by the up-to-date flag,
>>> so it is safe to assume that until it is "prepared", it is accessible
>>> because it was allocated by filemap_grab_folio() ->
>>> filemap_alloc_folio() and hasn't been taken over by the CoCo hardware.
>>> What scenario can you see where it doesn't apply as of now?
>>
>> Thanks for clarifying, see below.
>>
>>>
>>> I am aware of an attempt to remove preparation tracking from
>>> guest_memfd, but it is still at an RFC stage AFAIK [1].
>>>
>>>>
>>>> Do we know how this would interact with the direct-map removal?
>>>
>>> I'm using folio_test_uptodate() to determine if the page has been
>>> removed from the direct map as kvm_gmem_mark_prepared() is what
>>> currently removes the page from the direct map and marks it as
>>> up-to-date. [2] is a Firecracker feature branch where the two work in
>>> combination.
>>
>> Ah, okay. Yes, I recalled [1] that we wanted to change these semantics
>> to be "uptodate: was zeroed", and that preparation handling would be
>> essentially handled by the arch backend.
>
> Yes, I think we should not be overloading uptodate flag to be an
> indicator of what is private for CoCo guests. Uptodate flag should
> just mean zeroed/fresh folio. It's possible that future allocator
> backing for huge pages already provides uptodate folios.
Good point, thanks for sharing.
>
> If there is no current use case for read/write for CoCo VMs, I think
> it makes sense to disable it for now by checking the VM type before
> adding further overloading of uptodate flags.
Sounds fair. I can add a check for the VM type and only allow it for
KVM_X86_SW_PROTECTED_VM on x86. When ARM CCA support [1] is added we
should also check for KVM_VM_TYPE_ARM_NORMAL on ARM.
[1]:
https://lore.kernel.org/kvm/20250820145606.180644-1-steven.price@arm.com
>
>>
>> --
>> Cheers
>>
>> David / dhildenb
>>
>>
Powered by blists - more mailing lists