linux-kernel - Re: [PATCH] selftests: cgroup: make test_memcg

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <2c276ed9-626f-4bae-9d42-727dd176ec74@kylinos.cn>
Date: Thu, 20 Nov 2025 10:11:49 +0800
From: Guopeng Zhang <zhangguopeng@...inos.cn>
To: Lance Yang <ioworker0@...il.com>
Cc: hannes@...xchg.org, linux-kernel@...r.kernel.org,
 linux-kselftest@...r.kernel.org, linux-mm@...ck.org, mhocko@...nel.org,
 mkoutny@...e.com, muchun.song@...ux.dev, roman.gushchin@...ux.dev,
 shakeel.butt@...ux.dev, shuah@...nel.org, tj@...nel.org,
 leon.huangfu@...pee.com, Lance Yang <lance.yang@...ux.dev>
Subject: Re: [PATCH] selftests: cgroup: make test_memcg_sock robust against
 delayed sock stats



On 11/19/25 20:27, Lance Yang wrote:
> From: Lance Yang <lance.yang@...ux.dev>
> 
> 
> On Wed, 19 Nov 2025 18:52:16 +0800, Guopeng Zhang wrote:
>> test_memcg_sock() currently requires that memory.stat's "sock " counter
>> is exactly zero immediately after the TCP server exits. On a busy system
>> this assumption is too strict:
>>
>>   - Socket memory may be freed with a small delay (e.g. RCU callbacks).
>>   - memcg statistics are updated asynchronously via the rstat flushing
>>     worker, so the "sock " value in memory.stat can stay non-zero for a
>>     short period of time even after all socket memory has been uncharged.
>>
>> As a result, test_memcg_sock() can intermittently fail even though socket
>> memory accounting is working correctly.
>>
>> Make the test more robust by polling memory.stat for the "sock " counter
>> and allowing it some time to drop to zero instead of checking it only
>> once. If the counter does not become zero within the timeout, the test
>> still fails as before.
>>
>> On my test system, running test_memcontrol 50 times produced:
>>
>>   - Before this patch:  6/50 runs passed.
>>   - After this patch:  50/50 runs passed.
Hi Lance,

Thanks a lot for your review and helpful comments!
> 
> Good catch! Thanks!
> 
> With more CPU cores, updates may be distributed across cores, making it
> slower to reach the per-CPU flush threshold, IIUC :)
> 
Yes, that matches what I’ve seen as well — on larger systems it indeed
takes longer for the stats to converge due to per-CPU distribution and
the flush threshold.
>>
>> Signed-off-by: Guopeng Zhang <zhangguopeng@...inos.cn>
>> ---
>>  .../selftests/cgroup/test_memcontrol.c        | 24 ++++++++++++++++++-
>>  1 file changed, 23 insertions(+), 1 deletion(-)
>>
>> diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c b/tools/testing/selftests/cgroup/test_memcontrol.c
>> index 4e1647568c5b..86d9981cddd8 100644
>> --- a/tools/testing/selftests/cgroup/test_memcontrol.c
>> +++ b/tools/testing/selftests/cgroup/test_memcontrol.c
>> @@ -1384,6 +1384,8 @@ static int test_memcg_sock(const char *root)
>>  	int bind_retries = 5, ret = KSFT_FAIL, pid, err;
>>  	unsigned short port;
>>  	char *memcg;
>> +	long sock_post = -1;
>> +	int i, retries = 30;
>>  
>>  	memcg = cg_name(root, "memcg_test");
>>  	if (!memcg)
>> @@ -1432,7 +1434,27 @@ static int test_memcg_sock(const char *root)
>>  	if (cg_read_long(memcg, "memory.current") < 0)
>>  		goto cleanup;
>>  
>> -	if (cg_read_key_long(memcg, "memory.stat", "sock "))
>> +	/*
>> +	 * memory.stat is updated asynchronously via the memcg rstat
>> +	 * flushing worker, so the "sock " counter may stay non-zero
>> +	 * for a short period of time after the TCP connection is
>> +	 * closed and all socket memory has been uncharged.
>> +	 *
>> +	 * Poll memory.stat for up to 3 seconds and require that the
>> +	 * "sock " counter eventually drops to zero.
> 
> It might be worth mentioning that the current periodic rstat flush happens
> every 2 seconds (#define FLUSH_TIME (2UL*HZ)). Adding this context to the
> comment would explain why the 3-second timeout was chosen ;)
> 
Good idea,I actually started with a 2-second timeout to match the rstat flush
interval (FLUSH_TIME = 2*HZ), and that already reduced the flakiness
compared to the original code. However, on my test system there were
still a few intermittent failures over multiple runs. Bumping the
timeout to 3 seconds made the test stable (50/50 runs passed), so I
went with a slightly larger value to cover the periodic flush plus
scheduling noise on busy machines. I’ll update the comment in v2 to
spell out this rationale.
>> +	 */
>> +	for (i = 0; i < retries; i++) {
>> +		sock_post = cg_read_key_long(memcg, "memory.stat", "sock ");
>> +		if (sock_post < 0)
>> +			goto cleanup;
>> +
>> +		if (!sock_post)
>> +			break;
>> +
>> +		usleep(100 * 1000); /* 100ms */
> 
> Nit: It would be better to define the retry count and interval as macros
> (e.g., MAX_RETRIES, WAIT_INTERVAL) to avoid magic numbers and make the 3s
> timeout calculation explicit.
> 
Makes sense. I’ll introduce macros for the retry count and the wait
interval in v2 so that the 3s timeout is explicit and we don’t rely on
magic numbers.

I’ll send a v2 shortly.

Best regards,
Guopeng
>> +	}
>> +
>> +	if (sock_post)
>>  		goto cleanup;
>>  
>>  	ret = KSFT_PASS;
> 
> Thanks,
> Lance