linux-kernel - Re: [PATCH] mm: add stack trace when bad rss-counter state is detected

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <11ec00da-56c6-45b9-b04a-7e79467c4300@redhat.com>
Date: Wed, 23 Jul 2025 11:25:46 +0200
From: David Hildenbrand <david@...hat.com>
To: Vlastimil Babka <vbabka@...e.cz>, Xuanye Liu <liuqiye2025@....com>,
 Kees Cook <kees@...nel.org>
Cc: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
 Juri Lelli <juri.lelli@...hat.com>,
 Vincent Guittot <vincent.guittot@...aro.org>,
 Andrew Morton <akpm@...ux-foundation.org>,
 Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
 "Liam R. Howlett" <Liam.Howlett@...cle.com>, Mike Rapoport
 <rppt@...nel.org>, Suren Baghdasaryan <surenb@...gle.com>,
 Michal Hocko <mhocko@...e.com>, Dietmar Eggemann <dietmar.eggemann@....com>,
 Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
 Mel Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>,
 linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm: add stack trace when bad rss-counter state is
 detected

On 23.07.25 11:17, Vlastimil Babka wrote:
> On 7/23/25 11:10, Xuanye Liu wrote:
>>
>> 在 2025/7/23 16:42, David Hildenbrand 写道:
>>> On 23.07.25 10:05, David Hildenbrand wrote:
>>>> On 23.07.25 09:45, Xuanye Liu wrote:
>>>>>
>>>>> 在 2025/7/23 15:31, Kees Cook 写道:
>>>>>> On Wed, Jul 23, 2025 at 03:23:49PM +0800, Xuanye Liu wrote:
>>>>>>> The check_mm() function verifies the correctness of rss counters in
>>>>>>> struct mm_struct. Currently, it only prints an alert when a bad
>>>>>>> rss-counter state is detected, but lacks sufficient context for
>>>>>>> debugging.
>>>>>>>
>>>>>>> This patch adds a dump_stack() call to provide a stack trace when
>>>>>>> the rss-counter state is invalid. This helps developers identify
>>>>>>> where the corrupted mm_struct is being checked and trace the
>>>>>>> underlying cause of the inconsistency.
>>>>>> Why not just convert the pr_alert to a WARN?
>>>>> Good idea! I'll gather more feedback from others and then update to v2.
>>>>
>>>> Makes sense to me.
>>>
>>> After discussion this with Lorenzo off-list, isn't the stack completely misleading/useless in that case?
>>>
>>> Whatever caused the RSS counter mismatch (e.g., unmapped the wrong pages, missed to unmap pages) quite possibly happened in different context, way way earlier.
>>>
>>> Why would you think the stack trace would be of any value when destroying an MM (__mmdrop)?
>>>
>>> Having that said, I really hate these "pr_*("BUG: ...") with passion. Probably we'd want to invoke the panic_on_warn machinery, because something unexpected happened.
>>>
>> The stack trace dumped here may indeed not reflect the root cause ——
>> the actual error could have occurred much earlier, for example during a
>> failed or missing page map/unmap operation.
>> The current stack (e.g., in __mmdrop() or exit_mmap()) is merely part
>> of the cleanup phase.
>>
>> Given that, how should we go about identifying the root cause when such an issue occurs?
>>
>> Is there any existing way to trace it more effectively, or could we introduce a new mechanism
>> to monitor and detect these inconsistencies earlier?
>>
>> Let’s brainstorm possible solutions together.
> 
> Excellent idea! How about we introduce a function that walks the whole page
> tables and checks the numbers of individual pte types against the rss
> counters. And if we invoke it before and after every single pte update, we
> can pinpoint much sooner the moment it went wrong and the stack that lead to it?
> 

:)

On a more serious note, I ran into that usually after hitting a bunch of 
print_bad_pte() statements: vm_normal_page() would return NULL where it 
shouldn't, making us not adjust the RSS.

In which context would you run into this issue?

Usually it really indicates some fundamental page table handling flaw. 
E.g., page table corruption leading to print_bad_pte() earlier.

-- 
Cheers,

David / dhildenb