linux-kernel - Re: [PATCH] mm: khugepaged: fix NR_FILE_PAGES accounting in collapse

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <fd9d20c3-1bc0-4bad-bc5e-7d9549ddf8fa@gmail.com>
Date: Thu, 29 Jan 2026 22:49:26 +0000
From: Usama Arif <usamaarif642@...il.com>
To: Shakeel Butt <shakeel.butt@...ux.dev>,
 Andrew Morton <akpm@...ux-foundation.org>
Cc: Johannes Weiner <hannes@...xchg.org>, Rik van Riel <riel@...riel.com>,
 Song Liu <songliubraving@...com>, Kiryl Shutsemau <kas@...nel.org>,
 David Hildenbrand <david@...nel.org>,
 Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, Zi Yan <ziy@...dia.com>,
 Baolin Wang <baolin.wang@...ux.alibaba.com>,
 "Liam R . Howlett" <Liam.Howlett@...cle.com>, Nico Pache
 <npache@...hat.com>, Ryan Roberts <ryan.roberts@....com>,
 Dev Jain <dev.jain@....com>, Barry Song <baohua@...nel.org>,
 Lance Yang <lance.yang@...ux.dev>, Meta kernel team <kernel-team@...a.com>,
 linux-mm@...ck.org, cgroups@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm: khugepaged: fix NR_FILE_PAGES accounting in
 collapse_file()



On 29/01/2026 18:40, Shakeel Butt wrote:
> In META's fleet, we are seeing high level cgroups with zero file memcg
> stat but their descendants have non-zero file stat. This should not be
> possible. On further inspection by looking at kernel data structures
> though drgn, it was revealed that the high level cgroups have negative
> file stat which was aggregated from their children.
> 
> Another interesting point was that this specific issue start happening
> more often as we started deploying thp-always more widely which
> indicates some correlation between file memory and THPs and indeed it
> was found that file memcg stat accounting is buggy in the collapse code
> path from the start.
> 
> When collapse_file() replaces small folios with a large THP, it fails to
> properly update the NR_FILE_PAGES memcg stat for both the old folios
> being freed and the new THP being added. It assumes the old and new
> folios belong to the same cgroup. However this assumption breaks in
> couple of scenarios:
> 
> 1. Binary (executable) package downloader running in a different cgroup
>    than the actual job executing the downloaded package.
> 
> 2. File shared and mapped by processes running in different cgroups. One
>    process read-in the file and the second process either through
>    madvise(COLLAPSE) or khugepaged on behalf of second process
>    collapsing the file.
> 
> So, the current code has two bugs:
> 
> 1. For non-shmem files, NR_FILE_PAGES is never incremented for the new
>    THP because nr_none is always 0 for non-shmem, and the stat update is
>    inside the "if (nr_none)" block.
> 
> 2. When freeing old folios, NR_FILE_PAGES is never decremented because
>    folio->mapping is set to NULL directly without calling
>    filemap_unaccount_folio().
> 
> These bugs cause incorrect per-memcg accounting when the process
> triggering the collapse (MADV_COLLAPSE or khugepaged) belongs to a
> different memcg than the process that originally faulted in the pages:
> 
>   - Process A (memcg X) reads file, creating 512 small page cache folios
>     charged to memcg X (NR_FILE_PAGES += 512 for memcg X)
> 
>   - Process B (memcg Y) triggers collapse via MADV_COLLAPSE or khugepaged
>     scans B's mm. The new THP is charged to memcg Y.
> 
>   - Old folios freed: NR_FILE_PAGES not decremented (bug)
>     New THP added: NR_FILE_PAGES not incremented (bug)
> 
>   - Later, THP removed from page cache: NR_FILE_PAGES -= 512 for memcg Y
> 
> Result: memcg X has +512 inflated pages, memcg Y has -512 (negative!)
> 
> Fix this by:
> 1. Always incrementing NR_FILE_PAGES by HPAGE_PMD_NR for the new THP
> 2. Decrementing NR_FILE_PAGES for each old folio before clearing its
>    mapping pointer
> 
> For shmem with holes (nr_none > 0), the net change is still +nr_none
> since we decrement (HPAGE_PMD_NR - nr_none) old pages and increment
> HPAGE_PMD_NR new pages.
> 
> Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS")
> Signed-off-by: Shakeel Butt <shakeel.butt@...ux.dev

Acked-by: Usama Arif <usamaarif642@...il.com>