linux-kernel - Re: [RFC PATCH v2 05/51] KVM: guest_memfd: Skip LRU for guest

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <diqzplfp6dqk.fsf@ackerleytng-ctop.c.googlers.com>
Date: Fri, 30 May 2025 13:32:51 -0700
From: Ackerley Tng <ackerleytng@...gle.com>
To: Binbin Wu <binbin.wu@...ux.intel.com>
Cc: kvm@...r.kernel.org, linux-mm@...ck.org, linux-kernel@...r.kernel.org, 
	x86@...nel.org, linux-fsdevel@...r.kernel.org, aik@....com, 
	ajones@...tanamicro.com, akpm@...ux-foundation.org, amoorthy@...gle.com, 
	anthony.yznaga@...cle.com, anup@...infault.org, aou@...s.berkeley.edu, 
	bfoster@...hat.com, brauner@...nel.org, catalin.marinas@....com, 
	chao.p.peng@...el.com, chenhuacai@...nel.org, dave.hansen@...el.com, 
	david@...hat.com, dmatlack@...gle.com, dwmw@...zon.co.uk, 
	erdemaktas@...gle.com, fan.du@...el.com, fvdl@...gle.com, graf@...zon.com, 
	haibo1.xu@...el.com, hch@...radead.org, hughd@...gle.com, ira.weiny@...el.com, 
	isaku.yamahata@...el.com, jack@...e.cz, james.morse@....com, 
	jarkko@...nel.org, jgg@...pe.ca, jgowans@...zon.com, jhubbard@...dia.com, 
	jroedel@...e.de, jthoughton@...gle.com, jun.miao@...el.com, 
	kai.huang@...el.com, keirf@...gle.com, kent.overstreet@...ux.dev, 
	kirill.shutemov@...el.com, liam.merwick@...cle.com, 
	maciej.wieczor-retman@...el.com, mail@...iej.szmigiero.name, maz@...nel.org, 
	mic@...ikod.net, michael.roth@....com, mpe@...erman.id.au, 
	muchun.song@...ux.dev, nikunj@....com, nsaenz@...zon.es, 
	oliver.upton@...ux.dev, palmer@...belt.com, pankaj.gupta@....com, 
	paul.walmsley@...ive.com, pbonzini@...hat.com, pdurrant@...zon.co.uk, 
	peterx@...hat.com, pgonda@...gle.com, pvorel@...e.cz, qperret@...gle.com, 
	quic_cvanscha@...cinc.com, quic_eberman@...cinc.com, 
	quic_mnalajal@...cinc.com, quic_pderrin@...cinc.com, quic_pheragu@...cinc.com, 
	quic_svaddagi@...cinc.com, quic_tsoni@...cinc.com, richard.weiyang@...il.com, 
	rick.p.edgecombe@...el.com, rientjes@...gle.com, roypat@...zon.co.uk, 
	rppt@...nel.org, seanjc@...gle.com, shuah@...nel.org, steven.price@....com, 
	steven.sistare@...cle.com, suzuki.poulose@....com, tabba@...gle.com, 
	thomas.lendacky@....com, vannapurve@...gle.com, vbabka@...e.cz, 
	viro@...iv.linux.org.uk, vkuznets@...hat.com, wei.w.wang@...el.com, 
	will@...nel.org, willy@...radead.org, xiaoyao.li@...el.com, 
	yan.y.zhao@...el.com, yilun.xu@...el.com, yuzenghui@...wei.com, 
	zhiquan1.li@...el.com
Subject: Re: [RFC PATCH v2 05/51] KVM: guest_memfd: Skip LRU for guest_memfd folios

Binbin Wu <binbin.wu@...ux.intel.com> writes:

> On 5/15/2025 7:41 AM, Ackerley Tng wrote:
>> filemap_add_folio(), called from filemap_grab_folio(), adds the folio
>> onto some LRU list, which is not necessary for guest_memfd since
>> guest_memfd folios don't participate in any swapping.
>>
>> This patch reimplements part of filemap_add_folio() to avoid adding
>> allocated guest_memfd folios to the filemap.
>
> filemap -> LRU list?
>

Yes, thank you. Will fix this in the next revision.

>>
>> With shared to private conversions dependent on refcounts, avoiding
>> usage of LRU ensures that LRU lists no longer take any refcounts on
>> guest_memfd folios and significantly reduces the chance of elevated
>> refcounts during conversion.
>>
>> Signed-off-by: Ackerley Tng <ackerleytng@...gle.com>
>> Change-Id: Ia2540d9fc132d46219e6e714fd42bc82a62a27fa
>> ---
>>   mm/filemap.c           |  1 +
>>   mm/memcontrol.c        |  2 +
>>   virt/kvm/guest_memfd.c | 91 ++++++++++++++++++++++++++++++++++++++----
>>   3 files changed, 86 insertions(+), 8 deletions(-)
>>
> [...]
>>   /*
>>    * Returns a locked folio on success.  The caller is responsible for
>>    * setting the up-to-date flag before the memory is mapped into the guest.
>> @@ -477,8 +509,46 @@ static int kvm_gmem_prepare_folio(struct kvm *kvm, struct kvm_memory_slot *slot,
>>    */
>>   static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index)
>>   {
>> +	struct folio *folio;
>> +	gfp_t gfp;
>> +	int ret;
>> +
>> +repeat:
>> +	folio = filemap_lock_folio(inode->i_mapping, index);
>> +	if (!IS_ERR(folio))
>> +		return folio;
>> +
>> +	gfp = mapping_gfp_mask(inode->i_mapping);
>> +
>>   	/* TODO: Support huge pages. */
>> -	return filemap_grab_folio(inode->i_mapping, index);
>> +	folio = filemap_alloc_folio(gfp, 0);
>> +	if (!folio)
>> +		return ERR_PTR(-ENOMEM);
>> +
>> +	ret = mem_cgroup_charge(folio, NULL, gfp);
>> +	if (ret) {
>> +		folio_put(folio);
>> +		return ERR_PTR(ret);
>> +	}
>> +
>> +	ret = kvm_gmem_filemap_add_folio(inode->i_mapping, folio, index);
>> +	if (ret) {
>> +		folio_put(folio);
>> +
>> +		/*
>> +		 * There was a race, two threads tried to get a folio indexing
>> +		 * to the same location in the filemap. The losing thread should
>> +		 * free the allocated folio, then lock the folio added to the
>> +		 * filemap by the winning thread.
>
> How about changing
> “then lock the folio added to the filemap by the winning thread”
> to
> "the winning thread locks the folio added to the filemap"?
>

How about:

There was a race. Threads tried to get a folio indexing to the same
location in the filemap. The winning thread allocated and locked the
folio at the requested index. The losing threads should free the extra
allocated folio, then wait to lock the same folio allocated (and locked)
by the winning thread.

>> +		 */
>> +		if (ret == -EEXIST)
>> +			goto repeat;
>> +
>> +		return ERR_PTR(ret);
>> +	}
>> +
>> +	__folio_set_locked(folio);
>> +	return folio;
>>   }
>>   
>>   static void kvm_gmem_invalidate_begin(struct kvm_gmem *gmem, pgoff_t start,
>> @@ -956,23 +1026,28 @@ static int kvm_gmem_error_folio(struct address_space *mapping, struct folio *fol
>>   }
>>   
>>   #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
>> +static void kvm_gmem_invalidate(struct folio *folio)
>> +{
>> +	kvm_pfn_t pfn = folio_pfn(folio);
>> +
>> +	kvm_arch_gmem_invalidate(pfn, pfn + folio_nr_pages(folio));
>> +}
>> +#else
>> +static inline void kvm_gmem_invalidate(struct folio *folio) {}
>
> No need to tag a local static function with "inline".
>

Will fix in the next revision.

>> +#endif
>> +
> [...]