linux-kernel - Re: [RFC PATCH v1 09/37] KVM: guest_memfd: Skip LRU for guest

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CAEvNRgE07d_TaSVpkWO8gMfGgPsP9sBzrqMPCte8PET0THF=QA@mail.gmail.com>
Date: Tue, 27 Jan 2026 15:46:39 -0800
From: Ackerley Tng <ackerleytng@...gle.com>
To: Vlastimil Babka <vbabka@...e.cz>, cgroups@...r.kernel.org, kvm@...r.kernel.org, 
	linux-doc@...r.kernel.org, linux-fsdevel@...r.kernel.org, 
	linux-kernel@...r.kernel.org, linux-kselftest@...r.kernel.org, 
	linux-mm@...ck.org, linux-trace-kernel@...r.kernel.org, x86@...nel.org
Cc: akpm@...ux-foundation.org, binbin.wu@...ux.intel.com, bp@...en8.de, 
	brauner@...nel.org, chao.p.peng@...el.com, chenhuacai@...nel.org, 
	corbet@....net, dave.hansen@...el.com, dave.hansen@...ux.intel.com, 
	david@...hat.com, dmatlack@...gle.com, erdemaktas@...gle.com, 
	fan.du@...el.com, fvdl@...gle.com, haibo1.xu@...el.com, hannes@...xchg.org, 
	hch@...radead.org, hpa@...or.com, hughd@...gle.com, ira.weiny@...el.com, 
	isaku.yamahata@...el.com, jack@...e.cz, james.morse@....com, 
	jarkko@...nel.org, jgg@...pe.ca, jgowans@...zon.com, jhubbard@...dia.com, 
	jroedel@...e.de, jthoughton@...gle.com, jun.miao@...el.com, 
	kai.huang@...el.com, keirf@...gle.com, kent.overstreet@...ux.dev, 
	liam.merwick@...cle.com, maciej.wieczor-retman@...el.com, 
	mail@...iej.szmigiero.name, maobibo@...ngson.cn, 
	mathieu.desnoyers@...icios.com, maz@...nel.org, mhiramat@...nel.org, 
	mhocko@...nel.org, mic@...ikod.net, michael.roth@....com, mingo@...hat.com, 
	mlevitsk@...hat.com, mpe@...erman.id.au, muchun.song@...ux.dev, 
	nikunj@....com, nsaenz@...zon.es, oliver.upton@...ux.dev, palmer@...belt.com, 
	pankaj.gupta@....com, paul.walmsley@...ive.com, pbonzini@...hat.com, 
	peterx@...hat.com, pgonda@...gle.com, prsampat@....com, pvorel@...e.cz, 
	qperret@...gle.com, richard.weiyang@...il.com, rick.p.edgecombe@...el.com, 
	rientjes@...gle.com, rostedt@...dmis.org, roypat@...zon.co.uk, 
	rppt@...nel.org, seanjc@...gle.com, shakeel.butt@...ux.dev, shuah@...nel.org, 
	steven.price@....com, steven.sistare@...cle.com, suzuki.poulose@....com, 
	tabba@...gle.com, tglx@...utronix.de, thomas.lendacky@....com, 
	vannapurve@...gle.com, viro@...iv.linux.org.uk, vkuznets@...hat.com, 
	wei.w.wang@...el.com, will@...nel.org, willy@...radead.org, wyihan@...gle.com, 
	xiaoyao.li@...el.com, yan.y.zhao@...el.com, yilun.xu@...el.com, 
	yuzenghui@...wei.com, zhiquan1.li@...el.com
Subject: Re: [RFC PATCH v1 09/37] KVM: guest_memfd: Skip LRU for guest_memfd folios

Vlastimil Babka <vbabka@...e.cz> writes:

> On 10/17/25 22:11, Ackerley Tng wrote:
>> filemap_add_folio(), called from filemap_grab_folio(), adds folios to
>> an LRU list. This is unnecessary for guest_memfd, which does not
>> participate in swapping.
>
> IIRC guest_memfd mappings are unevictable. That should mean they are not
> ultimately added to a list (see lruvec_add_folio()).
>
>> In addition, the LRU list takes a reference count on the folio. With
>
> IIUC the refcount is temporary while being on the percpu
> &cpu_fbatches.lru_add, added by __folio_batch_add_and_move().

Thanks for pointing this out. You're right about this, I misunderstood
this refcounting earlier.

> When flushed
> via folio_batch_move_lru(), the refcount is removed and there's only the LRU
> folio flag that remains. The fbatch flushing can be triggered if you see an
> unexpected refcount increase.

The new plan is, to update kvm_gmem_is_safe_for_conversion() to drain
the fbatch if it some elevated refcount is found:

static bool kvm_gmem_is_safe_for_conversion(struct inode *inode,
					    pgoff_t start, size_t nr_pages,
					    pgoff_t *err_index)
{
	struct address_space *mapping = inode->i_mapping;
	const int filemap_get_folios_refcount = 1;
	pgoff_t last = start + nr_pages - 1;
	struct folio_batch fbatch;
	bool lru_drained = false;
	bool safe = true;
	int i;

	folio_batch_init(&fbatch);
	while (safe && filemap_get_folios(mapping, &start, last, &fbatch)) {

		for (i = 0; i < folio_batch_count(&fbatch);) {
			struct folio *folio = fbatch.folios[i];

			safe = (folio_ref_count(folio) ==
				folio_nr_pages(folio) +
				filemap_get_folios_refcount);

			if (safe) {
				++i;
			} else if (!lru_drained) {
				lru_add_drain_all();
				lru_drained = true;
			} else {
				*err_index = folio->index;
				break;
			}
		}

		folio_batch_release(&fbatch);
	}

	return safe;
}

I hope this is what you meant!

> So it might be feasible to do without this
> patch (maybe it was already tried and there were substantial issues, in
> which case should be mentioned).
>

The patch "KVM: guest_memfd: Skip LRU for guest_memfd folios" will be
dropped from the next revision, and "KVM: guest_memfd: Don't set
FGP_ACCESSED when getting folios" is no longer a requirement for this
patch series.

>> shared-to-private memory conversions for KVM guests dependent on folio
>> refcounts, this extra reference can cause conversions to fail due to
>> unexpected refcounts.
>>
>>
>> [...snip...]
>>