linux-kernel - Re: [RFC PATCH 15/39] KVM: guest_memfd: hugetlb: allocate and truncate from hugetlb

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <diqz34ldszp3.fsf@ackerleytng-ctop.c.googlers.com>
Date: Thu, 03 Oct 2024 20:23:36 +0000
From: Ackerley Tng <ackerleytng@...gle.com>
To: Elliot Berman <quic_eberman@...cinc.com>
Cc: tabba@...gle.com, roypat@...zon.co.uk, jgg@...dia.com, peterx@...hat.com, 
	david@...hat.com, rientjes@...gle.com, fvdl@...gle.com, jthoughton@...gle.com, 
	seanjc@...gle.com, pbonzini@...hat.com, zhiquan1.li@...el.com, 
	fan.du@...el.com, jun.miao@...el.com, isaku.yamahata@...el.com, 
	muchun.song@...ux.dev, mike.kravetz@...cle.com, erdemaktas@...gle.com, 
	vannapurve@...gle.com, qperret@...gle.com, jhubbard@...dia.com, 
	willy@...radead.org, shuah@...nel.org, brauner@...nel.org, bfoster@...hat.com, 
	kent.overstreet@...ux.dev, pvorel@...e.cz, rppt@...nel.org, 
	richard.weiyang@...il.com, anup@...infault.org, haibo1.xu@...el.com, 
	ajones@...tanamicro.com, vkuznets@...hat.com, maciej.wieczor-retman@...el.com, 
	pgonda@...gle.com, oliver.upton@...ux.dev, linux-kernel@...r.kernel.org, 
	linux-mm@...ck.org, kvm@...r.kernel.org, linux-kselftest@...r.kernel.org, 
	linux-fsdevel@...ck.org
Subject: Re: [RFC PATCH 15/39] KVM: guest_memfd: hugetlb: allocate and
 truncate from hugetlb

Elliot Berman <quic_eberman@...cinc.com> writes:

> On Tue, Sep 10, 2024 at 11:43:46PM +0000, Ackerley Tng wrote:
>> If HugeTLB is requested at guest_memfd creation time, HugeTLB pages
>> will be used to back guest_memfd.
>> 
>> Signed-off-by: Ackerley Tng <ackerleytng@...gle.com>
>>
>> <snip>
>>
>> +/**
>> + * Use the uptodate flag to indicate that the folio is prepared for KVM's usage.
>> + */
>>  static inline void kvm_gmem_mark_prepared(struct folio *folio)
>>  {
>>  	folio_mark_uptodate(folio);
>> @@ -72,13 +84,18 @@ static inline void kvm_gmem_mark_prepared(struct folio *folio)
>>  static int kvm_gmem_prepare_folio(struct kvm *kvm, struct kvm_memory_slot *slot,
>>  				  gfn_t gfn, struct folio *folio)
>>  {
>> -	unsigned long nr_pages, i;
>>  	pgoff_t index;
>>  	int r;
>>  
>> -	nr_pages = folio_nr_pages(folio);
>> -	for (i = 0; i < nr_pages; i++)
>> -		clear_highpage(folio_page(folio, i));
>> +	if (folio_test_hugetlb(folio)) {
>> +		folio_zero_user(folio, folio->index << PAGE_SHIFT);
>
> Is (folio->index << PAGE_SHIFT) the right address hint to provide?
> I don't think we can say the folio will be mapped at this address since
> this value is an offset into the file.  In most cases, I believe it
> won't be mapped anywhere since we just allocated it.

vaddr in folio_zero_user(folio, vaddr) is eventually passed to
clear_user_page(). clear_user_page() uses vaddr to clean up dcaches on
some architectures, according to Documentation/core-api/cachetlb.rst.

In this patch series, folio_zero_user() is used in 2 places:

+ kvm_gmem_prepare_folio()
+ kvm_gmem_fault()

folio->index is valid by the time folio_zero_user() is called in
kvm_gmem_prepare_folio(), because when kvm_gmem_prepare_folio() is called, the
folio is already in the filemap, and folio->index is set when the folios is
added to the filemap.

In kvm_gmem_fault(), kvm_gmem_get_folio() also returns a folio in the filemap
and so folio->index is valid by the tiem folio_zero_user() is called.

Hence in both cases where folio_zero_user() is called, folio->index <<
PAGE_SHIFT returns the offset in the file.

In hugetlb's fallocate, the offset within the file is passed in the call to
folio_zero_user(), which is why the offset within the file was used here.

In the next revision I will refactor this to something like
kvm_gmem_prepare_folio_shared() and kvm_gmem_prepare_folio_private().

In kvm_gmem_prepare_folio_private(), folio->index << PAGE_SHIFT can still be
passed as addr_hint to align with HugeTLB. When being prepared as a private
folio, the folio will be mapped by KVM: addr_hint won't matter since this folio
isn't going to be mapped into userspace. If the folio was previously used as a
shared page, unmapping would have flushed the dcache.

In kvm_gmem_prepare_folio_shared(), the folio will subsequently be mapped and
vmf->real_address should be passed as addr_hint.

Thanks for this question!