linux-kernel - Re: [RFC PATCH v2 38/51] KVM: guest_memfd: Split allocator pages for guest

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aE/81hmduC08B8Lt@yzhao56-desk.sh.intel.com>
Date: Mon, 16 Jun 2025 19:15:34 +0800
From: Yan Zhao <yan.y.zhao@...el.com>
To: Ackerley Tng <ackerleytng@...gle.com>
CC: <kvm@...r.kernel.org>, <linux-mm@...ck.org>,
	<linux-kernel@...r.kernel.org>, <x86@...nel.org>,
	<linux-fsdevel@...r.kernel.org>, <aik@....com>, <ajones@...tanamicro.com>,
	<akpm@...ux-foundation.org>, <amoorthy@...gle.com>,
	<anthony.yznaga@...cle.com>, <anup@...infault.org>, <aou@...s.berkeley.edu>,
	<bfoster@...hat.com>, <binbin.wu@...ux.intel.com>, <brauner@...nel.org>,
	<catalin.marinas@....com>, <chao.p.peng@...el.com>, <chenhuacai@...nel.org>,
	<dave.hansen@...el.com>, <david@...hat.com>, <dmatlack@...gle.com>,
	<dwmw@...zon.co.uk>, <erdemaktas@...gle.com>, <fan.du@...el.com>,
	<fvdl@...gle.com>, <graf@...zon.com>, <haibo1.xu@...el.com>,
	<hch@...radead.org>, <hughd@...gle.com>, <ira.weiny@...el.com>,
	<isaku.yamahata@...el.com>, <jack@...e.cz>, <james.morse@....com>,
	<jarkko@...nel.org>, <jgg@...pe.ca>, <jgowans@...zon.com>,
	<jhubbard@...dia.com>, <jroedel@...e.de>, <jthoughton@...gle.com>,
	<jun.miao@...el.com>, <kai.huang@...el.com>, <keirf@...gle.com>,
	<kent.overstreet@...ux.dev>, <kirill.shutemov@...el.com>,
	<liam.merwick@...cle.com>, <maciej.wieczor-retman@...el.com>,
	<mail@...iej.szmigiero.name>, <maz@...nel.org>, <mic@...ikod.net>,
	<michael.roth@....com>, <mpe@...erman.id.au>, <muchun.song@...ux.dev>,
	<nikunj@....com>, <nsaenz@...zon.es>, <oliver.upton@...ux.dev>,
	<palmer@...belt.com>, <pankaj.gupta@....com>, <paul.walmsley@...ive.com>,
	<pbonzini@...hat.com>, <pdurrant@...zon.co.uk>, <peterx@...hat.com>,
	<pgonda@...gle.com>, <pvorel@...e.cz>, <qperret@...gle.com>,
	<quic_cvanscha@...cinc.com>, <quic_eberman@...cinc.com>,
	<quic_mnalajal@...cinc.com>, <quic_pderrin@...cinc.com>,
	<quic_pheragu@...cinc.com>, <quic_svaddagi@...cinc.com>,
	<quic_tsoni@...cinc.com>, <richard.weiyang@...il.com>,
	<rick.p.edgecombe@...el.com>, <rientjes@...gle.com>, <roypat@...zon.co.uk>,
	<rppt@...nel.org>, <seanjc@...gle.com>, <shuah@...nel.org>,
	<steven.price@....com>, <steven.sistare@...cle.com>,
	<suzuki.poulose@....com>, <tabba@...gle.com>, <thomas.lendacky@....com>,
	<usama.arif@...edance.com>, <vannapurve@...gle.com>, <vbabka@...e.cz>,
	<viro@...iv.linux.org.uk>, <vkuznets@...hat.com>, <wei.w.wang@...el.com>,
	<will@...nel.org>, <willy@...radead.org>, <xiaoyao.li@...el.com>,
	<yilun.xu@...el.com>, <yuzenghui@...wei.com>, <zhiquan1.li@...el.com>
Subject: Re: [RFC PATCH v2 38/51] KVM: guest_memfd: Split allocator pages for
 guest_memfd use

On Thu, Jun 05, 2025 at 12:10:08PM -0700, Ackerley Tng wrote:
> Yan Zhao <yan.y.zhao@...el.com> writes:
> 
> > On Wed, May 14, 2025 at 04:42:17PM -0700, Ackerley Tng wrote:
> >> +static int kvm_gmem_convert_execute_work(struct inode *inode,
> >> +					 struct conversion_work *work,
> >> +					 bool to_shared)
> >> +{
> >> +	enum shareability m;
> >> +	int ret;
> >> +
> >> +	m = to_shared ? SHAREABILITY_ALL : SHAREABILITY_GUEST;
> >> +	ret = kvm_gmem_shareability_apply(inode, work, m);
> >> +	if (ret)
> >> +		return ret;
> >> +	/*
> >> +	 * Apply shareability first so split/merge can operate on new
> >> +	 * shareability state.
> >> +	 */
> >> +	ret = kvm_gmem_restructure_folios_in_range(
> >> +		inode, work->start, work->nr_pages, to_shared);
> >> +
> >> +	return ret;
> >> +}
> >> +
> 
> Hi Yan,
> 
> Thanks for your thorough reviews and your alternative suggestion in the
> other discussion at [1]! I'll try to bring the conversion-related parts
> of that discussion over here.
> 
> >>  static int kvm_gmem_convert_range(struct file *file, pgoff_t start,
> >>  				  size_t nr_pages, bool shared,
> >>  				  pgoff_t *error_index)
> 
> The guiding principle I was using for the conversion ioctls is
> 
> * Have the shareability updates and any necessary page restructuring
>   (aka splitting/merging) either fully complete for not at all by the
>   time the conversion ioctl returns.
> * Any unmapping (from host or guest page tables) will not be re-mapped
>   on errors.
> * Rollback undoes changes if conversion failed, and in those cases any
>   errors are turned into WARNings.
> 
> The rationale is that we want page sizes to be in sync with shareability
> so that any faults after the (successful or failed) conversion will not
> wrongly map in a larger page than allowed and cause any host crashes.
> 
> We considered 3 places where the memory can be mapped for conversions:
> 
> 1. Host page tables
> 2. Guest page tables
> 3. IOMMU page tables
> 
> Unmapping from host page tables is the simplest case. We unmap any
> shared ranges from the host page tables. Any accesses after the failed
> conversion would just fault the memory back in and proceed as usual.
> 
> guest_memfd memory is not unmapped from IOMMUs in conversions. This case
> is handled because IOMMU mappings hold refcounts. After unmapping from
> the host, we check for unexpected refcounts and fail if there are
> unexpected refcounts.
> 
> We also unmap from guest page tables. Considering failed conversions, if
> the pages are shared, we're good since the next time the guest accesses
> the page, the page will be faulted in as before.
> 
> If the pages are private, on the next guest access, the pages will be
> faulted in again as well. This is fine for software-protected VMs IIUC.
> 
> For TDX (and SNP) IIUC the memory would have been cleared, and the
> memory would also need to be re-accepted. I was thinking that this is by
> design, since when a TDX guest requests a conversion it knows that the
> contents is not to be used again.
This is not guaranteed.

On private-to-shared conversion failure, the guest may leak the page or release
the page. If the guest chooses the latter (e.g. in kvmclock_init_mem(),
kvm_arch_ptp_init()), the page is regarded as private by the guest OS.
Re-acceptance then will not happen before the guest access.

So, it's better for host to keep the original SEPT if private-to-shared
conversion fails.


 
> The userspace VMM is obligated to keep trying convert and if it gives
> up, userspace VMM should inform the guest that the conversion
> failed. The guest should handle conversion failures too and not assume
> that conversion always succeeds.
I don't think relying userspace to keep trying convert endlessly is a good
design.

> 
> Putting TDX aside for a moment, so far, there are a few ways this
> conversion could fail:
> 
> a. Unexpected refcounts. Userspace should clear up the unexpected
>    refcounts and report failure to the guest if it can't for whatever
>    reason.
This is acceptable. Unmapping shared mappings in the primary MMU or shared EPT
is harmless.

> b. ENOMEM because (i) we ran out of memory updating the shareability
>    maple_tree or (ii) since splitting involves allocating more memory
>    for struct pages and we ran out of memory there. In this case the
>    userspace VMM gets -ENOMEM and can make more memory available and
>    then retry, or if it can't, also report failure to the guest.
This is unacceptable. Why not reserve the memory before determining to start
the real conversion? If -ENOMEM is returned before executing the conversion, we
don't need to handle the restore error, which is impossible to be handled
gracefully.


> TDX introduces TDX-specific conversion failures (see discussion at
> [1]), which this series doesn't handle, but I think we still have a line
> of sight to handle new errors.
The errors can be divided into two categories:
(1) errors due to kernel bugs
(2) errors that could occur in normal or bad conditions (e.g. the -ENOMEM)

We can't handle (1), so BUG_ON or leaking memory is allowed.
However, we should try to avoid (2) especially in the rollback path.


> In the other thread [1], I was proposing to have guest_memfd decide what
> to do on errors, but I think that might be baking more TDX-specific
> details into guest_memfd/KVM, and perhaps this is better:
The TDX-specific errors in the unmapping path is of category (1).
So, we hope to resolve it by BUG_ON and leaking the memory.

The other conversion error for TDX is for splitting memory. We hope to
do the splitting before executing the real conversion.

Please check the proposal details at
https://lore.kernel.org/all/aE%2Fq9VKkmaCcuwpU@yzhao56-desk.sh.intel.com.

> We could return the errors to userspace and let userspace determine what
> to do. For retryable errors (as determined by userspace), it should do
> what it needs to do, and retry. For errors like TDX being unable to
> reclaim the memory, it could tell guest_memfd to leak that memory.
> 
> If userspace gives up, it should report conversion failure to the guest
> if userspace thinks the guest can continue (to a clean shutdown or
> otherwise). If something terrible happened during conversion, then
> userspace might have to exit itself or shutdown the host.
> 
> In [2], for TDX-specific conversion failures, you proposed prepping to
> eliminate errors and exiting early on failure, then actually
> unmapping. I think that could work too.
> 
> I'm a little concerned that prepping could be complicated, since the
> nature of conversion depends on the current state of shareability, and
> there's a lot to prepare, everything from counting memory required for
> maple_tree allocation (and merging ranges in the maple_tree), and
> counting the number of pages required for undoing vmemmap optimization
> in the case of splitting...
> 
> And even after doing all the prep to eliminate errors, the unmapping
> could fail in TDX-specific cases anyway, which still needs to be
> handled.
> 
> Hence I'm hoping you'll consider to let TDX-specific failures be
> built-in and handled alongside other failures by getting help from the
> userspace VMM, and in the worst case letting the guest know the
> conversion failed.
> 
> I also appreciate comments or suggestions from anyone else!
> 
> [1] https://lore.kernel.org/all/diqzfrgfp95d.fsf@ackerleytng-ctop.c.googlers.com/
> [2] https://lore.kernel.org/all/aEEEJbTzlncbRaRA@yzhao56-desk.sh.intel.com/
> 
> >> @@ -371,18 +539,21 @@ static int kvm_gmem_convert_range(struct file *file, pgoff_t start,
> >>  
> >>  	list_for_each_entry(work, &work_list, list) {
> >>  		rollback_stop_item = work;
> >> -		ret = kvm_gmem_shareability_apply(inode, work, m);
> >> +
> >> +		ret = kvm_gmem_convert_execute_work(inode, work, shared);
> >>  		if (ret)
> >>  			break;
> >>  	}
> >>  
> >>  	if (ret) {
> >> -		m = shared ? SHAREABILITY_GUEST : SHAREABILITY_ALL;
> >>  		list_for_each_entry(work, &work_list, list) {
> >> +			int r;
> >> +
> >> +			r = kvm_gmem_convert_execute_work(inode, work, !shared);
> >> +			WARN_ON(r);
> >> +
> >>  			if (work == rollback_stop_item)
> >>  				break;
> >> -
> >> -			WARN_ON(kvm_gmem_shareability_apply(inode, work, m));
> > Could kvm_gmem_shareability_apply() fail here?
> >
> 
> Yes, it could. If shareability cannot be updated, then we probably ran
> out of memory. Userspace VMM will probably get -ENOMEM set on some
> earlier ret and should handle that accordingly.
> 
> On -ENOMEM in a rollback, the host is in a very tough spot anyway, and a
> clean guest shutdown may be the only way out, hence this is a WARN and
> not returned to userspace.
> 
> >>  		}
> >>  	}
> >>  
> >> @@ -434,6 +605,277 @@ static int kvm_gmem_ioctl_convert_range(struct file *file,
> >>  	return ret;
> >>  }
> >>  
> >> +#ifdef CONFIG_KVM_GMEM_HUGETLB
> >> +
> >> +static inline void __filemap_remove_folio_for_restructuring(struct folio *folio)
> >> +{
> >> +	struct address_space *mapping = folio->mapping;
> >> +
> >> +	spin_lock(&mapping->host->i_lock);
> >> +	xa_lock_irq(&mapping->i_pages);
> >> +
> >> +	__filemap_remove_folio(folio, NULL);
> >> +
> >> +	xa_unlock_irq(&mapping->i_pages);
> >> +	spin_unlock(&mapping->host->i_lock);
> >> +}
> >> +
> >> +/**
> >> + * filemap_remove_folio_for_restructuring() - Remove @folio from filemap for
> >> + * split/merge.
> >> + *
> >> + * @folio: the folio to be removed.
> >> + *
> >> + * Similar to filemap_remove_folio(), but skips LRU-related calls (meaningless
> >> + * for guest_memfd), and skips call to ->free_folio() to maintain folio flags.
> >> + *
> >> + * Context: Expects only the filemap's refcounts to be left on the folio. Will
> >> + *          freeze these refcounts away so that no other users will interfere
> >> + *          with restructuring.
> >> + */
> >> +static inline void filemap_remove_folio_for_restructuring(struct folio *folio)
> >> +{
> >> +	int filemap_refcount;
> >> +
> >> +	filemap_refcount = folio_nr_pages(folio);
> >> +	while (!folio_ref_freeze(folio, filemap_refcount)) {
> >> +		/*
> >> +		 * At this point only filemap refcounts are expected, hence okay
> >> +		 * to spin until speculative refcounts go away.
> >> +		 */
> >> +		WARN_ONCE(1, "Spinning on folio=%p refcount=%d", folio, folio_ref_count(folio));
> >> +	}
> >> +
> >> +	folio_lock(folio);
> >> +	__filemap_remove_folio_for_restructuring(folio);
> >> +	folio_unlock(folio);
> >> +}
> >> +
> >> +/**
> >> + * kvm_gmem_split_folio_in_filemap() - Split @folio within filemap in @inode.
> >> + *
> >> + * @inode: inode containing the folio.
> >> + * @folio: folio to be split.
> >> + *
> >> + * Split a folio into folios of size PAGE_SIZE. Will clean up folio from filemap
> >> + * and add back the split folios.
> >> + *
> >> + * Context: Expects that before this call, folio's refcount is just the
> >> + *          filemap's refcounts. After this function returns, the split folios'
> >> + *          refcounts will also be filemap's refcounts.
> >> + * Return: 0 on success or negative error otherwise.
> >> + */
> >> +static int kvm_gmem_split_folio_in_filemap(struct inode *inode, struct folio *folio)
> >> +{
> >> +	size_t orig_nr_pages;
> >> +	pgoff_t orig_index;
> >> +	size_t i, j;
> >> +	int ret;
> >> +
> >> +	orig_nr_pages = folio_nr_pages(folio);
> >> +	if (orig_nr_pages == 1)
> >> +		return 0;
> >> +
> >> +	orig_index = folio->index;
> >> +
> >> +	filemap_remove_folio_for_restructuring(folio);
> >> +
> >> +	ret = kvm_gmem_allocator_ops(inode)->split_folio(folio);
> >> +	if (ret)
> >> +		goto err;
> >> +
> >> +	for (i = 0; i < orig_nr_pages; ++i) {
> >> +		struct folio *f = page_folio(folio_page(folio, i));
> >> +
> >> +		ret = __kvm_gmem_filemap_add_folio(inode->i_mapping, f,
> >> +						   orig_index + i);
> > Why does the failure of __kvm_gmem_filemap_add_folio() here lead to rollback,    
> > while the failure of the one under rollback only triggers WARN_ON()?
> >
> 
> Mostly because I don't really have a choice on rollback. On rollback we
> try to restore the merged folio back into the filemap, and if we
> couldn't, the host is probably in rather bad shape in terms of memory
> availability and there may not be many options for the userspace VMM.
Out of memory is not a good excuse for rollback error.

> Perhaps the different possible errors from
> __kvm_gmem_filemap_add_folio() in both should be handled differently. Do
> you have any suggestions on that?
Maybe reserving the memory or ruling out other factors that could lead to
conversion failure before executing the conversion?
Then BUG_ON the if the failure is cause by a bug.

> >> +		if (ret)
> >> +			goto rollback;
> >> +	}
> >> +
> >> +	return ret;
> >> +
> >> +rollback:
> >> +	for (j = 0; j < i; ++j) {
> >> +		struct folio *f = page_folio(folio_page(folio, j));
> >> +
> >> +		filemap_remove_folio_for_restructuring(f);
> >> +	}
> >> +
> >> +	kvm_gmem_allocator_ops(inode)->merge_folio(folio);
> >> +err:
> >> +	WARN_ON(__kvm_gmem_filemap_add_folio(inode->i_mapping, folio, orig_index));
> >> +
> >> +	return ret;
> >> +}
> >> +
> >> +static inline int kvm_gmem_try_split_folio_in_filemap(struct inode *inode,
> >> +						      struct folio *folio)
> >> +{
> >> +	size_t to_nr_pages;
> >> +	void *priv;
> >> +
> >> +	if (!kvm_gmem_has_custom_allocator(inode))
> >> +		return 0;
> >> +
> >> +	priv = kvm_gmem_allocator_private(inode);
> >> +	to_nr_pages = kvm_gmem_allocator_ops(inode)->nr_pages_in_page(priv);
> >> +
> >> +	if (kvm_gmem_has_some_shared(inode, folio->index, to_nr_pages))
> >> +		return kvm_gmem_split_folio_in_filemap(inode, folio);
> >> +
> >> +	return 0;
> >> +}
> >> +
> >> +/**
> >> + * kvm_gmem_merge_folio_in_filemap() - Merge @first_folio within filemap in
> >> + * @inode.
> >> + *
> >> + * @inode: inode containing the folio.
> >> + * @first_folio: first folio among folios to be merged.
> >> + *
> >> + * Will clean up subfolios from filemap and add back the merged folio.
> >> + *
> >> + * Context: Expects that before this call, all subfolios only have filemap
> >> + *          refcounts. After this function returns, the merged folio will only
> >> + *          have filemap refcounts.
> >> + * Return: 0 on success or negative error otherwise.
> >> + */
> >> +static int kvm_gmem_merge_folio_in_filemap(struct inode *inode,
> >> +					   struct folio *first_folio)
> >> +{
> >> +	size_t to_nr_pages;
> >> +	pgoff_t index;
> >> +	void *priv;
> >> +	size_t i;
> >> +	int ret;
> >> +
> >> +	index = first_folio->index;
> >> +
> >> +	priv = kvm_gmem_allocator_private(inode);
> >> +	to_nr_pages = kvm_gmem_allocator_ops(inode)->nr_pages_in_folio(priv);
> >> +	if (folio_nr_pages(first_folio) == to_nr_pages)
> >> +		return 0;
> >> +
> >> +	for (i = 0; i < to_nr_pages; ++i) {
> >> +		struct folio *f = page_folio(folio_page(first_folio, i));
> >> +
> >> +		filemap_remove_folio_for_restructuring(f);
> >> +	}
> >> +
> >> +	kvm_gmem_allocator_ops(inode)->merge_folio(first_folio);
> >> +
> >> +	ret = __kvm_gmem_filemap_add_folio(inode->i_mapping, first_folio, index);
> >> +	if (ret)
> >> +		goto err_split;
> >> +
> >> +	return ret;
> >> +
> >> +err_split:
> >> +	WARN_ON(kvm_gmem_allocator_ops(inode)->split_folio(first_folio));
> > guestmem_hugetlb_split_folio() is possible to fail. e.g.
> > After the stash is freed by guestmem_hugetlb_unstash_free_metadata() in
> > guestmem_hugetlb_merge_folio(), it's possible to get -ENOMEM for the stash
> > allocation in guestmem_hugetlb_stash_metadata() in
> > guestmem_hugetlb_split_folio().
> >
> >
> 
> Yes. This is also on the error path. In line with all the other error
> and rollback paths, I don't really have other options at this point,
> since on error, I probably ran out of memory, so I try my best to
> restore the original state but give up with a WARN otherwise.
> 
> >> +	for (i = 0; i < to_nr_pages; ++i) {
> >> +		struct folio *f = page_folio(folio_page(first_folio, i));
> >> +
> >> +		WARN_ON(__kvm_gmem_filemap_add_folio(inode->i_mapping, f, index + i));
> >> +	}
> >> +
> >> +	return ret;
> >> +}
> >> +
> >> +static inline int kvm_gmem_try_merge_folio_in_filemap(struct inode *inode,
> >> +						      struct folio *first_folio)
> >> +{
> >> +	size_t to_nr_pages;
> >> +	void *priv;
> >> +
> >> +	priv = kvm_gmem_allocator_private(inode);
> >> +	to_nr_pages = kvm_gmem_allocator_ops(inode)->nr_pages_in_folio(priv);
> >> +
> >> +	if (kvm_gmem_has_some_shared(inode, first_folio->index, to_nr_pages))
> >> +		return 0;
> >> +
> >> +	return kvm_gmem_merge_folio_in_filemap(inode, first_folio);
> >> +}
> >> +
> >> +static int kvm_gmem_restructure_folios_in_range(struct inode *inode,
> >> +						pgoff_t start, size_t nr_pages,
> >> +						bool is_split_operation)
> >> +{
> >> +	size_t to_nr_pages;
> >> +	pgoff_t index;
> >> +	pgoff_t end;
> >> +	void *priv;
> >> +	int ret;
> >> +
> >> +	if (!kvm_gmem_has_custom_allocator(inode))
> >> +		return 0;
> >> +
> >> +	end = start + nr_pages;
> >> +
> >> +	/* Round to allocator page size, to check all (huge) pages in range. */
> >> +	priv = kvm_gmem_allocator_private(inode);
> >> +	to_nr_pages = kvm_gmem_allocator_ops(inode)->nr_pages_in_folio(priv);
> >> +
> >> +	start = round_down(start, to_nr_pages);
> >> +	end = round_up(end, to_nr_pages);
> >> +
> >> +	for (index = start; index < end; index += to_nr_pages) {
> >> +		struct folio *f;
> >> +
> >> +		f = filemap_get_folio(inode->i_mapping, index);
> >> +		if (IS_ERR(f))
> >> +			continue;
> >> +
> >> +		/* Leave just filemap's refcounts on the folio. */
> >> +		folio_put(f);
> >> +
> >> +		if (is_split_operation)
> >> +			ret = kvm_gmem_split_folio_in_filemap(inode, f);
> > kvm_gmem_try_split_folio_in_filemap()?
> >
> 
> Here we know for sure that this was a private-to-shared
> conversion. Hence, we know that there are at least some shared parts in
> this huge page and we can skip checking that. 
Ok.

> >> +		else
> >> +			ret = kvm_gmem_try_merge_folio_in_filemap(inode, f);
> >> +
> 
> For merge, we don't know if the entire huge page might perhaps contain
> some other shared subpages, hence we "try" to merge by first checking
> against shareability to find shared subpages.
Makes sense.


> >> +		if (ret)
> >> +			goto rollback;
> >> +	}
> >> +	return ret;
> >> +
> >> +rollback:
> >> +	for (index -= to_nr_pages; index >= start; index -= to_nr_pages) {
> 
> Note to self: the first index -= to_nr_pages was meant to skip the index
> that caused the failure, but this could cause an underflow if index = 0
> when entering rollback. Need to fix this in the next revision.
Yes :)

> >> +		struct folio *f;
> >> +
> >> +		f = filemap_get_folio(inode->i_mapping, index);
> >> +		if (IS_ERR(f))
> >> +			continue;
> >> +
> >> +		/* Leave just filemap's refcounts on the folio. */
> >> +		folio_put(f);
> >> +
> >> +		if (is_split_operation)
> >> +			WARN_ON(kvm_gmem_merge_folio_in_filemap(inode, f));
> >> +		else
> >> +			WARN_ON(kvm_gmem_split_folio_in_filemap(inode, f));
> > Is it safe to just leave WARN_ON()s in the rollback case?
> >
> 
> Same as above. I don't think we have much of a choice.
> 
> > Besides, are the kvm_gmem_merge_folio_in_filemap() and
> > kvm_gmem_split_folio_in_filemap() here duplicated with the
> > kvm_gmem_split_folio_in_filemap() and kvm_gmem_try_merge_folio_in_filemap() in
> > the following "r = kvm_gmem_convert_execute_work(inode, work, !shared)"?
> >
> 
> This handles the case where some pages in the range [start, start +
> nr_pages) were split and the failure was halfway through. I could call
> kvm_gmem_convert_execute_work() with !shared but that would go over all
> the folios again from the start.
> 
> >> +	}
> >> +
> >> +	return ret;
> >> +}
> >> +
> >> +#else
> >> +
> >> +static inline int kvm_gmem_try_split_folio_in_filemap(struct inode *inode,
> >> +						      struct folio *folio)
> >> +{
> >> +	return 0;
> >> +}
> >> +
> >> +static int kvm_gmem_restructure_folios_in_range(struct inode *inode,
> >> +						pgoff_t start, size_t nr_pages,
> >> +						bool is_split_operation)
> >> +{
> >> +	return 0;
> >> +}
> >> +
> >> +#endif
> >> +
> >