[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fbe387c9-4205-41df-84b4-ace69f7cbedb@lucifer.local>
Date: Thu, 17 Jul 2025 18:44:20 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Zi Yan <ziy@...dia.com>
Cc: Balbir Singh <balbirs@...dia.com>, David Hildenbrand <david@...hat.com>,
linux-mm@...ck.org, Andrew Morton <akpm@...ux-foundation.org>,
Hugh Dickins <hughd@...gle.com>,
Kirill Shutemov <k.shutemov@...il.com>,
Baolin Wang <baolin.wang@...ux.alibaba.com>,
"Liam R. Howlett" <Liam.Howlett@...cle.com>,
Nico Pache <npache@...hat.com>, Ryan Roberts <ryan.roberts@....com>,
Dev Jain <dev.jain@....com>, Barry Song <baohua@...nel.org>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 1/2] mm/huge_memory: move unrelated code out of
__split_unmapped_folio()
On Thu, Jul 17, 2025 at 11:41:01AM -0400, Zi Yan wrote:
> On 17 Jul 2025, at 10:07, Lorenzo Stoakes wrote:
>
> > On Mon, Jul 14, 2025 at 01:18:22PM -0400, Zi Yan wrote:
> >> remap(), folio_ref_unfreeze(), lru_add_split_folio() are not relevant to
> >> splitting unmapped folio operations. Move them out to the caller so that
> >> __split_unmapped_folio() only handles unmapped folio splits. This makes
> >> __split_unmapped_folio() reusable.
> >
> > Nit but maybe worth mentioning the various renames etc.
>
> You mean release -> new_folio, origin_folio is replaced by folio?
> Sure, I can do that.
Yeah that kind of thing, just basically briefly mention the other stuff you
did.
Thanks!
>
> >
> >>
> >> Convert VM_BUG_ON(mapping) to use VM_WARN_ON_ONCE_FOLIO().
> >>
> >> Signed-off-by: Zi Yan <ziy@...dia.com>
> >> Acked-by: Balbir Singh <balbirs@...dia.com>
> >
> > After a lot of staring, 2 difftastic's at once and exactly 0 coverity
> > instances, I've convinced myself this looks right.
> >
> > I think you really should have split this up into smaller patches, as this
> > is moving stuff around and changing stuff all at once with a lot of
> > complexity and moving parts.
> >
> > However not going to make you do that, since you got acks and I don't want
> > to hold this up.
> >
> > I have a few nits + queries below that need addressing however, see below.
>
> Since I need to address these nits, I might just split this up.
> How about:
>
> 1.
Missing some text? :P
> >> @@ -3706,10 +3594,13 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
> >> {
> >> struct deferred_split *ds_queue = get_deferred_split_queue(folio);
> >> XA_STATE(xas, &folio->mapping->i_pages, folio->index);
> >> + struct folio *next_folio = folio_next(folio);
> >> bool is_anon = folio_test_anon(folio);
> >> struct address_space *mapping = NULL;
> >> struct anon_vma *anon_vma = NULL;
> >> int order = folio_order(folio);
> >> + struct folio *new_folio, *next;
> >> + int nr_shmem_dropped = 0;
> >> int extra_pins, ret;
> >> pgoff_t end;
> >> bool is_hzp;
> >
> > There's some VM_BUG_ON_FOLIO()'s in the code:
> >
> > VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
> > VM_BUG_ON_FOLIO(!folio_test_large(folio), folio);
> >
> > That should probably be VM_WARN_ON() or VM_WARN_ON_ONCE(), maybe worth
> > changing here too?
>
> Sure. I can convert them in a separate patch. Basically:
>
> if (!folio_test_locked(folio)) {
> VM_WARN_ON_ONCE_FOLIO(1, folio);
> return -EINVAL;
> }
>
> if (!folio_test_large(folio)) {
> VM_WARN_ON_ONCE_FOLIO(1, folio);
> return -EINVAL;
> }
Sounds good thanks!
>
> >
> >> @@ -3833,13 +3724,18 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
> >> */
> >> xas_lock(&xas);
> >> xas_reset(&xas);
> >> - if (xas_load(&xas) != folio)
> >> + if (xas_load(&xas) != folio) {
> >> + ret = -EAGAIN;
> >
> > It is beyond words that the original logic manually set ret == -EAGAIN...
> >
> > And this is the only place we 'goto fail'.
> >
> > Yikes this code is a horror show.
> >
> >
> >> goto fail;
> >> + }
> >> }
> >>
> >> /* Prevent deferred_split_scan() touching ->_refcount */
> >> spin_lock(&ds_queue->split_queue_lock);
> >> if (folio_ref_freeze(folio, 1 + extra_pins)) {
> >> + struct address_space *swap_cache = NULL;
> >> + struct lruvec *lruvec;
> >> +
> >> if (folio_order(folio) > 1 &&
> >> !list_empty(&folio->_deferred_list)) {
> >> ds_queue->split_queue_len--;
> >> @@ -3873,18 +3769,119 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
> >> }
> >> }
> >>
> >> - ret = __split_unmapped_folio(folio, new_order,
> >> - split_at, lock_at, list, end, &xas, mapping,
> >> - uniform_split);
> >> + if (folio_test_swapcache(folio)) {
> >> + if (mapping) {
> >> + VM_WARN_ON_ONCE_FOLIO(mapping, folio);
> >> + ret = -EINVAL;
> >> + goto fail;
> >
> > It's a new code path (in prod we'd just carry on, or in debug we would
> > haven oops'd), but I think valid.
> >
> > I wonder if this is almost over-cautious, as this would require a non-anon
> > folio to be in the swap-cache (since the is_anon path will set mapping
> > NUL).
> >
> > But at the same time, probably worth keeping in at least for now.
>
> Originally, it is a VM_BUG_ON(mapping). I am converting it to a warning.
> I will move it to a separate patch to avoid confusion.
Thanks
>
> >
> >> + }
> >> +
> >> + /*
> >> + * a swapcache folio can only be uniformly split to
> >> + * order-0
> >> + */
> >> + if (!uniform_split || new_order != 0) {
> >> + ret = -EINVAL;
> >> + goto fail;
> >> + }
> >> +
> >> + swap_cache = swap_address_space(folio->swap);
> >> + xa_lock(&swap_cache->i_pages);
> >> + }
> >> +
> >> + /* lock lru list/PageCompound, ref frozen by page_ref_freeze */
> >> + lruvec = folio_lruvec_lock(folio);
> >> +
> >> + ret = __split_unmapped_folio(folio, new_order, split_at, &xas,
> >> + mapping, uniform_split);
> >> +
> >> + /*
> >> + * Unfreeze after-split folios and put them back to the right
> >> + * list. @folio should be kept frozon until page cache entries
> >> + * are updated with all the other after-split folios to prevent
> >> + * others seeing stale page cache entries.
> >> + */
> >> + for (new_folio = folio_next(folio); new_folio != next_folio;
> >> + new_folio = next) {
> >
> > Hm now we have 'next' and 'next_folio', this is quite confusing.
> >
> > Seems to me new_folio should be end_folio no, like the original? And maybe
> > then rename next to next_folio? As it is kinda inconsistent that it isn't
> > suffixed with _folio anyway.
>
> Sure. Will do. next_folio was coming from __split_unmapped_folio() code,
> I should have renamed it. Thanks for pointing it out.
Thanks, yeah this was existing cofusion and not your fault, but good to
make it easier to understand.
>
> >
> >> + next = folio_next(new_folio);
> >> +
> >
> > We're no longer doing what would here be new_folio == origin_folio
> > (previously, release == origin_folio).
> >
> > Is this correct? Why do we no longer ned to do this?
> >
> > Is it because __split_unmapped_folio() will somehow take care of this in
> > advance/render this meaningless?
> >
> > This definitely needs to be mentioned in the commit message.
>
> Because “new_folio = folio_next(folio)” in the for loop initialization
> part. The @folio is skipped at the very beginning. I will add a comment
> to highlight this, since the code change is too subtle.
Ahh yes, that is quite subtle, a comment would be helpful, thanks!
>
> >
> >> + folio_ref_unfreeze(
> >> + new_folio,
> >> + 1 + ((mapping || swap_cache) ?
> >> + folio_nr_pages(new_folio) :
> >> + 0));
> >
> > Again, be nice to separate this out, but I think in a follow-up not here.
>
> OK.
Well, actually no - you fix this in the next patch :P
>
> >
> >> +
> >> + lru_add_split_folio(folio, new_folio, lruvec, list);
> >> +
> >> + /* Some pages can be beyond EOF: drop them from cache */
> >> + if (new_folio->index >= end) {
> >> + if (shmem_mapping(mapping))
> >> + nr_shmem_dropped += folio_nr_pages(new_folio);
> >> + else if (folio_test_clear_dirty(new_folio))
> >> + folio_account_cleaned(
> >> + new_folio,
> >> + inode_to_wb(mapping->host));
> >> + __filemap_remove_folio(new_folio, NULL);
> >> + folio_put_refs(new_folio,
> >> + folio_nr_pages(new_folio));
> >> + } else if (mapping) {
> >> + __xa_store(&mapping->i_pages, new_folio->index,
> >> + new_folio, 0);
> >> + } else if (swap_cache) {
> >> + __xa_store(&swap_cache->i_pages,
> >> + swap_cache_index(new_folio->swap),
> >> + new_folio, 0);
> >> + }
> >> + }
> >> + /*
> >> + * Unfreeze @folio only after all page cache entries, which
> >> + * used to point to it, have been updated with new folios.
> >> + * Otherwise, a parallel folio_try_get() can grab origin_folio
> >> + * and its caller can see stale page cache entries.
> >> + */
> >> + folio_ref_unfreeze(folio, 1 +
> >> + ((mapping || swap_cache) ? folio_nr_pages(folio) : 0));
> >
> > This line is horrid, probably one for a future series but this sort of
> > calculation of what the number of refs should be post-freeze should clearly
> > be separated out or at least made abundantly clear in an open-coded
> > implementation.
>
> It is addressed in patch 2. And you already noticed it. :)
Haha yes, always like it when that happens :)
>
> >
> >> +
> >> + unlock_page_lruvec(lruvec);
> >> +
> >> + if (swap_cache)
> >> + xa_unlock(&swap_cache->i_pages);
> >> } else {
> >> spin_unlock(&ds_queue->split_queue_lock);
> >> -fail:
> >> - if (mapping)
> >> - xas_unlock(&xas);
> >> - local_irq_enable();
> >> - remap_page(folio, folio_nr_pages(folio), 0);
> >> ret = -EAGAIN;
> >> }
> >> +fail:
> >> + if (mapping)
> >> + xas_unlock(&xas);
> >> +
> >> + local_irq_enable();
> >> +
> >> + if (nr_shmem_dropped)
> >> + shmem_uncharge(mapping->host, nr_shmem_dropped);
> >> +
> >> + remap_page(folio, 1 << order,
> >> + !ret && folio_test_anon(folio) ? RMP_USE_SHARED_ZEROPAGE :
> >> + 0);
> >
> > I really don't like this !ret but here, this isn't very readable.
> >
> > Something like:
> >
> > int flags;
> >
> > ...
> >
> > if (!ret && folio_test_anon(folio))
> > flags = RMP_USE_SHARED_ZERO_PAGE;
> > remap_page(folio, 1 << order, flags);
> >
> > Would be better.
> >
> > But really this is all screaming out to be separated into parts of
> > course. But that's one for a follow-up series...
>
> Sure. Will add another patch to address this.
Thanks!
>
> >
> >> +
> >> + /*
> >> + * Unlock all after-split folios except the one containing @lock_at
> >> + * page. If @folio is not split, it will be kept locked.
> >> + */
> >> + for (new_folio = folio; new_folio != next_folio; new_folio = next) {
> >> + next = folio_next(new_folio);
> >> + if (new_folio == page_folio(lock_at))
> >> + continue;
> >> +
> >> + folio_unlock(new_folio);
> >> + /*
> >> + * Subpages may be freed if there wasn't any mapping
> >> + * like if add_to_swap() is running on a lru page that
> >> + * had its mapping zapped. And freeing these pages
> >> + * requires taking the lru_lock so we do the put_page
> >> + * of the tail pages after the split is complete.
> >> + */
> >> + free_folio_and_swap_cache(new_folio);
> >> + }
> >>
> >> out_unlock:
> >> if (anon_vma) {
> >> --
> >> 2.47.2
> >>
> >
> > Generally I see why you're not using origin_folio any more since you can
> > just use folio everywhere, but I wonder if this makes things more
> > confusing.
> >
> > On the other hand, this function is already hugely confusing so maybe not a
> > big deal and can be addressed in follow ups...
>
> Since it is causing confusion. I will use origin_folio in the moved code.
Thanks!
>
> I will send V4 to address your comments and add
> “mm/huge_memory: refactor after-split (page) cache code.” in.
Much appreciated :)
>
> Best Regards,
> Yan, Zi
Powered by blists - more mailing lists