linux-kernel - Re: [PATCH v2 00/28] Eliminate Dying Memory Cgroup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <59098b4f-c3bf-4b6c-80fb-604e6e1c755e@meta.com>
Date: Tue, 30 Dec 2025 14:18:51 -0500
From: Chris Mason <clm@...a.com>
To: Shakeel Butt <shakeel.butt@...ux.dev>, Zi Yan <ziy@...dia.com>
Cc: Roman Gushchin <roman.gushchin@...ux.dev>, Qi Zheng <qi.zheng@...ux.dev>,
        hannes@...xchg.org, hughd@...gle.com, mhocko@...e.com,
        muchun.song@...ux.dev, david@...nel.org, lorenzo.stoakes@...cle.com,
        harry.yoo@...cle.com, imran.f.khan@...cle.com,
        kamalesh.babulal@...cle.com, axelrasmussen@...gle.com,
        yuanchu@...gle.com, weixugc@...gle.com, chenridong@...weicloud.com,
        mkoutny@...e.com, akpm@...ux-foundation.org,
        hamzamahfooz@...ux.microsoft.com, apais@...ux.microsoft.com,
        lance.yang@...ux.dev, linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        cgroups@...r.kernel.org, Qi Zheng <zhengqi.arch@...edance.com>,
        Chris Mason <clm@...com>
Subject: Re: [PATCH v2 00/28] Eliminate Dying Memory Cgroup

On 12/30/25 1:13 PM, Shakeel Butt wrote:
> On Tue, Dec 30, 2025 at 11:46:02AM -0500, Zi Yan wrote:
>> On 29 Dec 2025, at 23:48, Shakeel Butt wrote:
>>
>>> On Tue, Dec 30, 2025 at 12:25:31PM +0800, Qi Zheng wrote:
>>>>
>>>>
>>> [...]
>>>>>>
>>>>>> Thank you for running the AI review for this patchset, but please do not
>>>>>> directly send the raw data from the AI review to the community, as this
>>>>>> is no different from automated review by a robot.
>>>>>
>>>>> Hi Qi,
>>>>>
>>>>> I don't know why you're so negative towards it. It's been great at
>>>>
>>>> No, I don't object to having a dedicated robot to do this.
>>>>
>>>>> finding pretty tricky bugs often missed by human reviewers. In no way
>>>>> it's a replacement for human reviews, but if a robot can find real
>>>>> issues and make the kernel more reliable and safe, I'm in.
>>>>
>>>> I just think you should do a preliminary review of the AI review results
>>>> instead of sending them out directly. Otherwise, if everyone does this,
>>>> the community will be full of bots.

I do think it's awkward to dump the whole review output for the patch
series in a single message.  It looks like there's a sudden jump to XML?
It's better to reply to the individual patches with the comments
inline, which I think is where Roman is trying to go long term.

With BPF, it looks more like this:
https://lore.kernel.org/bpf/?q=AI+reviewed+your+patch

>>>>
>>>> No?
>>>>
>>>
>>> We don't want too many bots but we definitely want at least one AI
>>> review bot. Now we have precedence of BPF and networking subsystem and
>>> the results I have seen are really good. I think the MM community needs
>>> to come together and decide on the formalities of AI review process and
>>> I see Roman is doing some early experimentation and result looks great.
>>
>> Do you mind explaining why the result looks great? Does it mean you agree
>> the regressions pointed out by the AI review?
> 
> The result looks great because the points raised are really thought
> provoking and things I have not thought about when I reviewed the
> series. The lru lock without irq or the possible infinite retry loop in
> get_mem_cgroup_from_folio() are two such examples. Are these real
> regressions? I am not sure.
> 
>>
>> If we want to do AI reviews, the process should be improved instead of
>> just pasting the output from AI. In the initial stage, I think some human
>> intervention is needed, at least adding some comment on AI reviews would
>> be helpful.
> 
> Yes I agree and therefore I mentioned we should discuss how should we
> (MM community) should adopt the AI reviews.

What tends to happen with BPF is the patch author or bpf maintainers
point out problems with the reviews and I fix up the prompts over time.
The false positive rate is ~20% today (measured since late October), and
it's generally declining.

> 
>> Otherwise, it looks like you agree completely with AI reviews.
>> In addition, “50% of the reported issues are real”, is the AI tossing
>> a coin when reporting issues?
>>
>> When I am looking into the prompt part, I have the following questions:
>>
>> 1. What is “Prompts SHA: 192922ae6bf4 ("bpf.md: adjust the documentation
>> about bpf kfunc parameter validation”)”? I got the actual prompts
>> from irc: https://github.com/masoncl/review-prompts/tree/main , but it
>> should be provided along with the review for others to reproduce.
> 
> I agree and I didn't know that Chris's review prompts are used here.
> 
> Ccing Chris for your following questions.
> 
>>>> 2. Looking at the mm prompt: https://github.com/masoncl/review-prompts/blob/main/mm.md , are you sure the patterns are all right?
>> 	a. Page/Folio States, Large folios require per-page state tracking for
>> 		Reference counts. I thought we want to get rid of per page refcount.

Early in prompt development I hand picked a few hundred patches from
6.16 fixing bugs, and I iterated on these adding subsystem knowledge to
catch the known bugs.  That's where that rule came from, but as you say
there's a risk this information gets old.  Do we want to get rid of per
page refcounts or have we done it?  (more on that at the bottom of the
email).

>>     b. Migration Invariants, NUMA balancing expects valid PTE combinations.
>> 		PROTNONE PTEs are hardware invalid to trigger fault.
>> 	c. TLB flushes required after PTE modifications. How about spurious fault
>> 		handling?
>>

AI generally uses them as a starting point and fills in details, but
I agree the MM bits are pretty minimal.

>> 3. For a cgroup patchset, I was expecting some cgroup specific prompt rules,
>> 	but could not find any. What am I missing?

I think the only cgroup specific information I've needed so far is
explaining css_get() and the section on __GFP_ACCOUNT.  I actively try
to avoid adding details unless we're missing bugs or generating false
positives.

As an example of how I'd fix the prompt if the per page state tracking
were causing problems (and if we didn't want to just remove it), I asked
claude to analyze how it is still used.  The output is below, I'd double
check things as best I could, shorten into prompt form and send to the
list for review.

Per-Page Tracking in Large Folios - Analysis
=============================================

Based on analysis of mm/*.c files and commit history, MM-004's claim is
still partially true - large folios do need per-page tracking for some
bits, though recent work has significantly reduced this.


Bits That Still Require Per-Page Tracking
------------------------------------------

1. PG_hwpoison (include/linux/page-flags.h:118)

   Defined as PAGEFLAG(HWPoison, hwpoison, PF_ANY), this flag is set on
   individual pages within a large folio when hardware memory corruption
   is detected.

   The folio_test_has_hwpoisoned() flag on the second page indicates at
   least one subpage is poisoned, but does not identify which one.

   When splitting a large folio, page_range_has_hwpoisoned() in
   mm/huge_memory.c:3467 iterates through pages checking PageHWPoison()
   for each:

       static bool page_range_has_hwpoisoned(struct page *page, long nr_pages)
       {
           for (; nr_pages; page++, nr_pages--)
               if (PageHWPoison(page))
                   return true;
           return false;
       }

   Used in rmap code (mm/rmap.c:1990, 2070, 2473) to check individual
   subpages when unmapping or migrating.

2. PG_anon_exclusive (include/linux/page-flags.h:146)

   Per the comment at include/linux/page-flags.h:139-145:

       "Depending on the way an anonymous folio can be mapped into a page
       table (e.g., single PMD/PUD/CONT of the head page vs. PTE-mapped
       THP), PG_anon_exclusive may be set only for the head page or for
       tail pages of an anonymous folio. For now, we only expect it to be
       set on tail pages for PTE-mapped THP."

   Used at mm/rmap.c:1408-1416: when RMAP_EXCLUSIVE flag is set for
   PTE-level mappings, it iterates through each page:

       for (i = 0; i < nr_pages; i++)
           SetPageAnonExclusive(page + i);

   HugeTLB stores this on head page only (see PageAnonExclusive() at
   include/linux/page-flags.h:1153-1162), but PTE-mapped THP needs
   per-page tracking.


Recent Changes - Per-Page Mapcount Removed
------------------------------------------

Commit 749492229e3bd ("mm: stop maintaining the per-page mapcount of
large folios") by David Hildenbrand (March 2025) introduced
CONFIG_NO_PAGE_MAPCOUNT which:

  - Stops maintaining per-page mapcounts in tail pages of large folios
  - Tail page mapcount is now always logically 0 (-1 value)
  - Removed _nr_pages_mapped tracking

This was a significant simplification, but it does not affect the
per-page flag tracking described above.


Flags Stored in Second Page Only (Not Per-Page)
-----------------------------------------------

These are stored in the first tail page (FOLIO_SECOND_PAGE) and apply to
the entire folio, not individual pages:

  - PG_has_hwpoisoned  - indicates some page in folio is poisoned
  - PG_large_rmappable - folio is rmappable
  - PG_partially_mapped - folio is partially mapped

See PAGE_FLAGS_SECOND definition at include/linux/page-flags.h:1218-1220.


Conclusion
----------

While per-page mapcount has been eliminated, PG_hwpoison and
PG_anon_exclusive (for PTE-mapped THP) still require per-page tracking
in large folios. MM-004's claim remains valid for these specific bits.

Key source files:
  - include/linux/page-flags.h (flag definitions and accessors)
  - mm/huge_memory.c (folio split handling)
  - mm/rmap.c (reverse mapping with per-page exclusive tracking)