[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e8f55fef-1821-408e-88ed-b25200ef66c9@amazon.co.uk>
Date: Mon, 7 Oct 2024 16:56:42 +0100
From: Patrick Roy <roypat@...zon.co.uk>
To: Ackerley Tng <ackerleytng@...gle.com>, Elliot Berman
<quic_eberman@...cinc.com>
CC: <tabba@...gle.com>, <jgg@...dia.com>, <peterx@...hat.com>,
<david@...hat.com>, <rientjes@...gle.com>, <fvdl@...gle.com>,
<jthoughton@...gle.com>, <seanjc@...gle.com>, <pbonzini@...hat.com>,
<zhiquan1.li@...el.com>, <fan.du@...el.com>, <jun.miao@...el.com>,
<isaku.yamahata@...el.com>, <muchun.song@...ux.dev>,
<mike.kravetz@...cle.com>, <erdemaktas@...gle.com>, <vannapurve@...gle.com>,
<qperret@...gle.com>, <jhubbard@...dia.com>, <willy@...radead.org>,
<shuah@...nel.org>, <brauner@...nel.org>, <bfoster@...hat.com>,
<kent.overstreet@...ux.dev>, <pvorel@...e.cz>, <rppt@...nel.org>,
<richard.weiyang@...il.com>, <anup@...infault.org>, <haibo1.xu@...el.com>,
<ajones@...tanamicro.com>, <vkuznets@...hat.com>,
<maciej.wieczor-retman@...el.com>, <pgonda@...gle.com>,
<oliver.upton@...ux.dev>, <linux-kernel@...r.kernel.org>,
<linux-mm@...ck.org>, <kvm@...r.kernel.org>,
<linux-kselftest@...r.kernel.org>, <linux-fsdevel@...ck.org>, James Gowans
<jgowans@...zon.com>, "Kalyazin, Nikita" <kalyazin@...zon.co.uk>, "Manwaring,
Derek" <derekmn@...zon.com>
Subject: Re: [RFC PATCH 30/39] KVM: guest_memfd: Handle folio preparation for
guest_memfd mmap
Hi Ackerley,
On Thu, 2024-10-03 at 22:32 +0100, Ackerley Tng wrote:
> Elliot Berman <quic_eberman@...cinc.com> writes:
>
>> On Tue, Sep 10, 2024 at 11:44:01PM +0000, Ackerley Tng wrote:
>>> Since guest_memfd now supports mmap(), folios have to be prepared
>>> before they are faulted into userspace.
>>>
>>> When memory attributes are switched between shared and private, the
>>> up-to-date flags will be cleared.
>>>
>>> Use the folio's up-to-date flag to indicate being ready for the guest
>>> usage and can be used to mark whether the folio is ready for shared OR
>>> private use.
>>
>> Clearing the up-to-date flag also means that the page gets zero'd out
>> whenever it transitions between shared and private (either direction).
>> pKVM (Android) hypervisor policy can allow in-place conversion between
>> shared/private.
>>
>> I believe the important thing is that sev_gmem_prepare() needs to be
>> called prior to giving page to guest. In my series, I had made a
>> ->prepare_inaccessible() callback where KVM would only do this part.
>> When transitioning to inaccessible, only that callback would be made,
>> besides the bookkeeping. The folio zeroing happens once when allocating
>> the folio if the folio is initially accessible (faultable).
>>
>> From x86 CoCo perspective, I think it also makes sense to not zero
>> the folio when changing faultiblity from private to shared:
>> - If guest is sharing some data with host, you've wiped the data and
>> guest has to copy again.
>> - Or, if SEV/TDX enforces that page is zero'd between transitions,
>> Linux has duplicated the work that trusted entity has already done.
>>
>> Fuad and I can help add some details for the conversion. Hopefully we
>> can figure out some of the plan at plumbers this week.
>
> Zeroing the page prevents leaking host data (see function docstring for
> kvm_gmem_prepare_folio() introduced in [1]), so we definitely don't want
> to introduce a kernel data leak bug here.
>
> In-place conversion does require preservation of data, so for
> conversions, shall we zero depending on VM type?
>
> + Gunyah: don't zero since ->prepare_inaccessible() is a no-op
> + pKVM: don't zero
> + TDX: don't zero
> + SEV: AMD Architecture Programmers Manual 7.10.6 says there is no
> automatic encryption and implies no zeroing, hence perform zeroing
> + KVM_X86_SW_PROTECTED_VM: Doesn't have a formal definition so I guess
> we could require zeroing on transition?
Maybe for KVM_X86_SW_PROTECTED_VM we could make zero-ing configurable
via some CREATE_GUEST_MEMFD flag, instead of forcing one specific
behavior.
For the "non-CoCo with direct map entries removed" VMs that we at AWS
are going for, we'd like a VM type with host-controlled in-place
conversions which doesn't zero on transitions, so if
KVM_X86_SW_PROTECTED_VM ends up zeroing, we'd need to add another new VM
type for that.
Somewhat related sidenote: For VMs that allow inplace conversions and do
not zero, we do not need to zap the stage-2 mappings on memory attribute
changes, right?
> This way, the uptodate flag means that it has been prepared (as in
> sev_gmem_prepare()), and zeroed if required by VM type.
>
> Regarding flushing the dcache/tlb in your other question [2], if we
> don't use folio_zero_user(), can we relying on unmapping within core-mm
> to flush after shared use, and unmapping within KVM To flush after
> private use?
>
> Or should flush_dcache_folio() be explicitly called on kvm_gmem_fault()?
>
> clear_highpage(), used in the non-hugetlb (original) path, doesn't flush
> the dcache. Was that intended?
>
>> Thanks,
>> Elliot
>>
>>>
>>> <snip>
>
> [1] https://lore.kernel.org/all/20240726185157.72821-8-pbonzini@redhat.com/
> [2] https://lore.kernel.org/all/diqz34ldszp3.fsf@ackerleytng-ctop.c.googlers.com/
Best,
Patrick
Powered by blists - more mailing lists