lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fbe387c9-4205-41df-84b4-ace69f7cbedb@lucifer.local>
Date: Thu, 17 Jul 2025 18:44:20 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Zi Yan <ziy@...dia.com>
Cc: Balbir Singh <balbirs@...dia.com>, David Hildenbrand <david@...hat.com>,
        linux-mm@...ck.org, Andrew Morton <akpm@...ux-foundation.org>,
        Hugh Dickins <hughd@...gle.com>,
        Kirill Shutemov <k.shutemov@...il.com>,
        Baolin Wang <baolin.wang@...ux.alibaba.com>,
        "Liam R. Howlett" <Liam.Howlett@...cle.com>,
        Nico Pache <npache@...hat.com>, Ryan Roberts <ryan.roberts@....com>,
        Dev Jain <dev.jain@....com>, Barry Song <baohua@...nel.org>,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 1/2] mm/huge_memory: move unrelated code out of
 __split_unmapped_folio()

On Thu, Jul 17, 2025 at 11:41:01AM -0400, Zi Yan wrote:
> On 17 Jul 2025, at 10:07, Lorenzo Stoakes wrote:
>
> > On Mon, Jul 14, 2025 at 01:18:22PM -0400, Zi Yan wrote:
> >> remap(), folio_ref_unfreeze(), lru_add_split_folio() are not relevant to
> >> splitting unmapped folio operations. Move them out to the caller so that
> >> __split_unmapped_folio() only handles unmapped folio splits. This makes
> >> __split_unmapped_folio() reusable.
> >
> > Nit but maybe worth mentioning the various renames etc.
>
> You mean release -> new_folio, origin_folio is replaced by folio?
> Sure, I can do that.

Yeah that kind of thing, just basically briefly mention the other stuff you
did.

Thanks!

>
> >
> >>
> >> Convert VM_BUG_ON(mapping) to use VM_WARN_ON_ONCE_FOLIO().
> >>
> >> Signed-off-by: Zi Yan <ziy@...dia.com>
> >> Acked-by: Balbir Singh <balbirs@...dia.com>
> >
> > After a lot of staring, 2 difftastic's at once and exactly 0 coverity
> > instances, I've convinced myself this looks right.
> >
> > I think you really should have split this up into smaller patches, as this
> > is moving stuff around and changing stuff all at once with a lot of
> > complexity and moving parts.
> >
> > However not going to make you do that, since you got acks and I don't want
> > to hold this up.
> >
> > I have a few nits + queries below that need addressing however, see below.
>
> Since I need to address these nits, I might just split this up.
> How about:
>
> 1.

Missing some text? :P


> >> @@ -3706,10 +3594,13 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
> >>  {
> >>  	struct deferred_split *ds_queue = get_deferred_split_queue(folio);
> >>  	XA_STATE(xas, &folio->mapping->i_pages, folio->index);
> >> +	struct folio *next_folio = folio_next(folio);
> >>  	bool is_anon = folio_test_anon(folio);
> >>  	struct address_space *mapping = NULL;
> >>  	struct anon_vma *anon_vma = NULL;
> >>  	int order = folio_order(folio);
> >> +	struct folio *new_folio, *next;
> >> +	int nr_shmem_dropped = 0;
> >>  	int extra_pins, ret;
> >>  	pgoff_t end;
> >>  	bool is_hzp;
> >
> > There's some VM_BUG_ON_FOLIO()'s in the code:
> >
> > 	VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
> > 	VM_BUG_ON_FOLIO(!folio_test_large(folio), folio);
> >
> > That should probably be VM_WARN_ON() or VM_WARN_ON_ONCE(), maybe worth
> > changing here too?
>
> Sure. I can convert them in a separate patch. Basically:
>
> if (!folio_test_locked(folio)) {
> 	VM_WARN_ON_ONCE_FOLIO(1, folio);
> 	return -EINVAL;
> }
>
> if (!folio_test_large(folio)) {
> 	VM_WARN_ON_ONCE_FOLIO(1, folio);
> 	return -EINVAL;
> }

Sounds good thanks!

>
> >
> >> @@ -3833,13 +3724,18 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
> >>  		 */
> >>  		xas_lock(&xas);
> >>  		xas_reset(&xas);
> >> -		if (xas_load(&xas) != folio)
> >> +		if (xas_load(&xas) != folio) {
> >> +			ret = -EAGAIN;
> >
> > It is beyond words that the original logic manually set ret == -EAGAIN...
> >
> > And this is the only place we 'goto fail'.
> >
> > Yikes this code is a horror show.
> >
> >
> >>  			goto fail;
> >> +		}
> >>  	}
> >>
> >>  	/* Prevent deferred_split_scan() touching ->_refcount */
> >>  	spin_lock(&ds_queue->split_queue_lock);
> >>  	if (folio_ref_freeze(folio, 1 + extra_pins)) {
> >> +		struct address_space *swap_cache = NULL;
> >> +		struct lruvec *lruvec;
> >> +
> >>  		if (folio_order(folio) > 1 &&
> >>  		    !list_empty(&folio->_deferred_list)) {
> >>  			ds_queue->split_queue_len--;
> >> @@ -3873,18 +3769,119 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
> >>  			}
> >>  		}
> >>
> >> -		ret = __split_unmapped_folio(folio, new_order,
> >> -				split_at, lock_at, list, end, &xas, mapping,
> >> -				uniform_split);
> >> +		if (folio_test_swapcache(folio)) {
> >> +			if (mapping) {
> >> +				VM_WARN_ON_ONCE_FOLIO(mapping, folio);
> >> +				ret = -EINVAL;
> >> +				goto fail;
> >
> > It's a new code path (in prod we'd just carry on, or in debug we would
> > haven oops'd), but I think valid.
> >
> > I wonder if this is almost over-cautious, as this would require a non-anon
> > folio to be in the swap-cache (since the is_anon path will set mapping
> > NUL).
> >
> > But at the same time, probably worth keeping in at least for now.
>
> Originally, it is a VM_BUG_ON(mapping). I am converting it to a warning.
> I will move it to a separate patch to avoid confusion.

Thanks

>
> >
> >> +			}
> >> +
> >> +			/*
> >> +			 * a swapcache folio can only be uniformly split to
> >> +			 * order-0
> >> +			 */
> >> +			if (!uniform_split || new_order != 0) {
> >> +				ret = -EINVAL;
> >> +				goto fail;
> >> +			}
> >> +
> >> +			swap_cache = swap_address_space(folio->swap);
> >> +			xa_lock(&swap_cache->i_pages);
> >> +		}
> >> +
> >> +		/* lock lru list/PageCompound, ref frozen by page_ref_freeze */
> >> +		lruvec = folio_lruvec_lock(folio);
> >> +
> >> +		ret = __split_unmapped_folio(folio, new_order, split_at, &xas,
> >> +					     mapping, uniform_split);
> >> +
> >> +		/*
> >> +		 * Unfreeze after-split folios and put them back to the right
> >> +		 * list. @folio should be kept frozon until page cache entries
> >> +		 * are updated with all the other after-split folios to prevent
> >> +		 * others seeing stale page cache entries.
> >> +		 */
> >> +		for (new_folio = folio_next(folio); new_folio != next_folio;
> >> +		     new_folio = next) {
> >
> > Hm now we have 'next' and 'next_folio', this is quite confusing.
> >
> > Seems to me new_folio should be end_folio no, like the original? And maybe
> > then rename next to next_folio? As it is kinda inconsistent that it isn't
> > suffixed with _folio anyway.
>
> Sure. Will do. next_folio was coming from __split_unmapped_folio() code,
> I should have renamed it. Thanks for pointing it out.

Thanks, yeah this was existing cofusion and not your fault, but good to
make it easier to understand.

>
> >
> >> +			next = folio_next(new_folio);
> >> +
> >
> > We're no longer doing what would here be new_folio == origin_folio
> > (previously, release == origin_folio).
> >
> > Is this correct? Why do we no longer ned to do this?
> >
> > Is it because __split_unmapped_folio() will somehow take care of this in
> > advance/render this meaningless?
> >
> > This definitely needs to be mentioned in the commit message.
>
> Because “new_folio = folio_next(folio)” in the for loop initialization
> part. The @folio is skipped at the very beginning. I will add a comment
> to highlight this, since the code change is too subtle.

Ahh yes, that is quite subtle, a comment would be helpful, thanks!
>
> >
> >> +			folio_ref_unfreeze(
> >> +				new_folio,
> >> +				1 + ((mapping || swap_cache) ?
> >> +					     folio_nr_pages(new_folio) :
> >> +					     0));
> >
> > Again, be nice to separate this out, but I think in a follow-up not here.
>
> OK.

Well, actually no - you fix this in the next patch :P

>
> >
> >> +
> >> +			lru_add_split_folio(folio, new_folio, lruvec, list);
> >> +
> >> +			/* Some pages can be beyond EOF: drop them from cache */
> >> +			if (new_folio->index >= end) {
> >> +				if (shmem_mapping(mapping))
> >> +					nr_shmem_dropped += folio_nr_pages(new_folio);
> >> +				else if (folio_test_clear_dirty(new_folio))
> >> +					folio_account_cleaned(
> >> +						new_folio,
> >> +						inode_to_wb(mapping->host));
> >> +				__filemap_remove_folio(new_folio, NULL);
> >> +				folio_put_refs(new_folio,
> >> +					       folio_nr_pages(new_folio));
> >> +			} else if (mapping) {
> >> +				__xa_store(&mapping->i_pages, new_folio->index,
> >> +					   new_folio, 0);
> >> +			} else if (swap_cache) {
> >> +				__xa_store(&swap_cache->i_pages,
> >> +					   swap_cache_index(new_folio->swap),
> >> +					   new_folio, 0);
> >> +			}
> >> +		}
> >> +		/*
> >> +		 * Unfreeze @folio only after all page cache entries, which
> >> +		 * used to point to it, have been updated with new folios.
> >> +		 * Otherwise, a parallel folio_try_get() can grab origin_folio
> >> +		 * and its caller can see stale page cache entries.
> >> +		 */
> >> +		folio_ref_unfreeze(folio, 1 +
> >> +			((mapping || swap_cache) ? folio_nr_pages(folio) : 0));
> >
> > This line is horrid, probably one for a future series but this sort of
> > calculation of what the number of refs should be post-freeze should clearly
> > be separated out or at least made abundantly clear in an open-coded
> > implementation.
>
> It is addressed in patch 2. And you already noticed it. :)

Haha yes, always like it when that happens :)

>
> >
> >> +
> >> +		unlock_page_lruvec(lruvec);
> >> +
> >> +		if (swap_cache)
> >> +			xa_unlock(&swap_cache->i_pages);
> >>  	} else {
> >>  		spin_unlock(&ds_queue->split_queue_lock);
> >> -fail:
> >> -		if (mapping)
> >> -			xas_unlock(&xas);
> >> -		local_irq_enable();
> >> -		remap_page(folio, folio_nr_pages(folio), 0);
> >>  		ret = -EAGAIN;
> >>  	}
> >> +fail:
> >> +	if (mapping)
> >> +		xas_unlock(&xas);
> >> +
> >> +	local_irq_enable();
> >> +
> >> +	if (nr_shmem_dropped)
> >> +		shmem_uncharge(mapping->host, nr_shmem_dropped);
> >> +
> >> +	remap_page(folio, 1 << order,
> >> +		   !ret && folio_test_anon(folio) ? RMP_USE_SHARED_ZEROPAGE :
> >> +						    0);
> >
> > I really don't like this !ret but here, this isn't very readable.
> >
> > Something like:
> >
> > 	int flags;
> >
> > 	...
> >
> > 	if (!ret && folio_test_anon(folio))
> > 		flags = RMP_USE_SHARED_ZERO_PAGE;
> > 	remap_page(folio, 1 << order, flags);
> >
> > Would be better.
> >
> > But really this is all screaming out to be separated into parts of
> > course. But that's one for a follow-up series...
>
> Sure. Will add another patch to address this.

Thanks!

>
> >
> >> +
> >> +	/*
> >> +	 * Unlock all after-split folios except the one containing @lock_at
> >> +	 * page. If @folio is not split, it will be kept locked.
> >> +	 */
> >> +	for (new_folio = folio; new_folio != next_folio; new_folio = next) {
> >> +		next = folio_next(new_folio);
> >> +		if (new_folio == page_folio(lock_at))
> >> +			continue;
> >> +
> >> +		folio_unlock(new_folio);
> >> +		/*
> >> +		 * Subpages may be freed if there wasn't any mapping
> >> +		 * like if add_to_swap() is running on a lru page that
> >> +		 * had its mapping zapped. And freeing these pages
> >> +		 * requires taking the lru_lock so we do the put_page
> >> +		 * of the tail pages after the split is complete.
> >> +		 */
> >> +		free_folio_and_swap_cache(new_folio);
> >> +	}
> >>
> >>  out_unlock:
> >>  	if (anon_vma) {
> >> --
> >> 2.47.2
> >>
> >
> > Generally I see why you're not using origin_folio any more since you can
> > just use folio everywhere, but I wonder if this makes things more
> > confusing.
> >
> > On the other hand, this function is already hugely confusing so maybe not a
> > big deal and can be addressed in follow ups...
>
> Since it is causing confusion. I will use origin_folio in the moved code.

Thanks!

>
> I will send V4 to address your comments and add
> “mm/huge_memory: refactor after-split (page) cache code.” in.

Much appreciated :)

>
> Best Regards,
> Yan, Zi

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ