lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <pcxsack4hwio6ydm6r3e36bkwt6fg5i7vvarqs3fvuslswealj@bk2xi55vrdsn>
Date: Fri, 11 Apr 2025 19:22:48 +0200
From: Michal Koutný <mkoutny@...e.com>
To: Waiman Long <longman@...hat.com>
Cc: Johannes Weiner <hannes@...xchg.org>, Michal Hocko <mhocko@...nel.org>, 
	Roman Gushchin <roman.gushchin@...ux.dev>, Shakeel Butt <shakeel.butt@...ux.dev>, 
	Muchun Song <muchun.song@...ux.dev>, Andrew Morton <akpm@...ux-foundation.org>, 
	Tejun Heo <tj@...nel.org>, Shuah Khan <shuah@...nel.org>, linux-kernel@...r.kernel.org, 
	cgroups@...r.kernel.org, linux-mm@...ck.org, linux-kselftest@...r.kernel.org
Subject: Re: [PATCH v5 2/2] selftests: memcg: Increase error tolerance of
 child memory.current check in test_memcg_protection()

On Mon, Apr 07, 2025 at 12:23:16PM -0400, Waiman Long <longman@...hat.com> wrote:
>   Child   Actual usage    Expected usage    %err
>   -----   ------------    --------------    ----
>     1       16990208         22020096      -12.9%
>     1       17252352         22020096      -12.1%
>     0       37699584         30408704      +10.7%
>     1       14368768         22020096      -21.0%
>     1       16871424         22020096      -13.2%
> 
> The current 10% error tolerenace might be right at the time
> test_memcontrol.c was first introduced in v4.18 kernel, but memory
> reclaim have certainly evolved quite a bit since then which may result
> in a bit more run-to-run variation than previously expected.

I like Roman's suggestion of nr_cpus dependence but I assume your
variations were still on the same system, weren't they?
Is it fair to say that reclaim is chaotic [1]? I wonder what may cause
variations between separate runs of the test.

Would it help to `echo 3 >drop_caches` before each run to have more
stable initial conditions? (Not sure if it's OK in selftests.)

<del>Or sleep 0.5s to settle rstat flushing?</del> No, page_counter's
don't suffer that but stock MEMCG_CHARGE_BATCH in percpu stocks.
So maybe drain the stock so that counters are precise after the test?
(Either by executing a dummy memcg on each CPU or via some debugging
API.)

Michal

[1] https://en.wikipedia.org/wiki/Chaos_theory#Chaotic_dynamics

Download attachment "signature.asc" of type "application/pgp-signature" (229 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ