linux-kernel - Re: [PATCH] mm: fix the inaccurate memory statistics issue for users

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <250ec733-8b2d-4c56-858c-6aada9544a55@linux.alibaba.com>
Date: Wed, 4 Jun 2025 20:46:02 +0800
From: Baolin Wang <baolin.wang@...ux.alibaba.com>
To: Shakeel Butt <shakeel.butt@...ux.dev>, Michal Hocko <mhocko@...e.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, david@...hat.com,
 lorenzo.stoakes@...cle.com, Liam.Howlett@...cle.com, vbabka@...e.cz,
 rppt@...nel.org, surenb@...gle.com, donettom@...ux.ibm.com,
 aboorvad@...ux.ibm.com, sj@...nel.org, linux-mm@...ck.org,
 linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm: fix the inaccurate memory statistics issue for users



On 2025/6/4 01:29, Shakeel Butt wrote:
> On Tue, Jun 03, 2025 at 04:48:08PM +0200, Michal Hocko wrote:
>> On Tue 03-06-25 22:22:46, Baolin Wang wrote:
>>> Let me try to clarify further.
>>>
>>> The 'mm->rss_stat' is updated by using add_mm_counter(),
>>> dec/inc_mm_counter(), which are all wrappers around
>>> percpu_counter_add_batch(). In percpu_counter_add_batch(), there is percpu
>>> batch caching to avoid 'fbc->lock' contention.
>>
>> OK, this is exactly the line of argument I was looking for. If _all_
>> updates done in the kernel are using batching and therefore the lock is
>> only held every N (percpu_counter_batch) updates then a risk of locking
>> contention would be decreased. This is worth having a note in the
>> changelog.

OK.

>>> This patch changes task_mem()
>>> and task_statm() to get the accurate mm counters under the 'fbc->lock', but
>>> this will not exacerbate kernel 'mm->rss_stat' lock contention due to the
>>> the percpu batch caching of the mm counters.
>>>
>>> You might argue that my test cases cannot demonstrate an actual lock
>>> contention, but they have already shown that there is no significant
>>> 'fbc->lock' contention when the kernel updates 'mm->rss_stat'.
>>
>> I was arguing that `top -d 1' doesn't really represent a potential
>> adverse usage. These proc files are generally readable so I would be
>> expecting something like busy loop read while process tries to update
>> counters to see the worst case scenario. If that is barely visible then
>> we can conclude a normal use wouldn't even notice.

OK.

> Baolin, please run stress-ng command that stresses minor anon page
> faults in multiple threads and then run multiple bash scripts which cat
> /proc/pidof(stress-ng)/status. That should be how much the stress-ng
> process is impacted by the parallel status readers versus without them.

Sure. Thanks Shakeel. I run the stress-ng with the 'stress-ng --fault 32 
--perf -t 1m' command, while simultaneously running the following 
scripts to read the /proc/pidof(stress-ng)/status for each thread.

 From the following data, I did not observe any obvious impact of this 
patch on the stress-ng tests when repeatedly reading the 
/proc/pidof(stress-ng)/status.

w/o patch
stress-ng: info:  [6891]          3,993,235,331,584 CPU Cycles 
          59.767 B/sec
stress-ng: info:  [6891]          1,472,101,565,760 Instructions 
          22.033 B/sec (0.369 instr. per cycle)
stress-ng: info:  [6891]                 36,287,456 Page Faults Total 
           0.543 M/sec
stress-ng: info:  [6891]                 36,287,456 Page Faults Minor 
           0.543 M/sec

w/ patch
stress-ng: info:  [6872]          4,018,592,975,968 CPU Cycles 
          60.177 B/sec
stress-ng: info:  [6872]          1,484,856,150,976 Instructions 
          22.235 B/sec (0.369 instr. per cycle)
stress-ng: info:  [6872]                 36,547,456 Page Faults Total 
           0.547 M/sec
stress-ng: info:  [6872]                 36,547,456 Page Faults Minor 
           0.547 M/sec

=========================
#!/bin/bash

# Get the PIDs of stress-ng processes
PIDS=$(pgrep stress-ng)

# Loop through each PID and monitor /proc/[pid]/status
for PID in $PIDS; do
     while true; do
         cat /proc/$PID/status
	usleep 100000
     done &
done