lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <22368dc1-e026-4e9d-bb65-6df62f960a15@redhat.com>
Date: Fri, 11 Apr 2025 17:42:12 -0400
From: Waiman Long <llong@...hat.com>
To: Michal Koutný <mkoutny@...e.com>
Cc: Johannes Weiner <hannes@...xchg.org>, Michal Hocko <mhocko@...nel.org>,
 Roman Gushchin <roman.gushchin@...ux.dev>,
 Shakeel Butt <shakeel.butt@...ux.dev>, Muchun Song <muchun.song@...ux.dev>,
 Andrew Morton <akpm@...ux-foundation.org>, Tejun Heo <tj@...nel.org>,
 Shuah Khan <shuah@...nel.org>, linux-kernel@...r.kernel.org,
 cgroups@...r.kernel.org, linux-mm@...ck.org, linux-kselftest@...r.kernel.org
Subject: Re: [PATCH v5 2/2] selftests: memcg: Increase error tolerance of
 child memory.current check in test_memcg_protection()


On 4/11/25 1:22 PM, Michal Koutný wrote:
> On Mon, Apr 07, 2025 at 12:23:16PM -0400, Waiman Long <longman@...hat.com> wrote:
>>    Child   Actual usage    Expected usage    %err
>>    -----   ------------    --------------    ----
>>      1       16990208         22020096      -12.9%
>>      1       17252352         22020096      -12.1%
>>      0       37699584         30408704      +10.7%
>>      1       14368768         22020096      -21.0%
>>      1       16871424         22020096      -13.2%
>>
>> The current 10% error tolerenace might be right at the time
>> test_memcontrol.c was first introduced in v4.18 kernel, but memory
>> reclaim have certainly evolved quite a bit since then which may result
>> in a bit more run-to-run variation than previously expected.
> I like Roman's suggestion of nr_cpus dependence but I assume your
> variations were still on the same system, weren't they?
> Is it fair to say that reclaim is chaotic [1]? I wonder what may cause
> variations between separate runs of the test.
Yes, the variation I saw was on the same system with multiple runs. The 
memory.current values are read by the time the parent cgroup memory 
usage reaches near the target 50M, but how much memory are remaining in 
each child varies from run-to-run. You can say that it is somewhat chaotic.
>
> Would it help to `echo 3 >drop_caches` before each run to have more
> stable initial conditions? (Not sure if it's OK in selftests.)

I don't know, we may have to try it out. However, I doubt it will have 
an effect.


>
> <del>Or sleep 0.5s to settle rstat flushing?</del> No, page_counter's
> don't suffer that but stock MEMCG_CHARGE_BATCH in percpu stocks.
> So maybe drain the stock so that counters are precise after the test?
> (Either by executing a dummy memcg on each CPU or via some debugging
> API.)

The test itself is already sleeping up to 5 times in 1s interval to wait 
until the parent memory usage is settled down.

Cheers,
Longman


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ