[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <20220425151612.izmxhkgugq6isyz3@google.com>
Date: Mon, 25 Apr 2022 15:16:12 +0000
From: Shakeel Butt <shakeelb@...gle.com>
To: Yosry Ahmed <yosryahmed@...gle.com>
Cc: Johannes Weiner <hannes@...xchg.org>,
Michal Hocko <mhocko@...nel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Roman Gushchin <roman.gushchin@...ux.dev>,
David Rientjes <rientjes@...gle.com>,
Tejun Heo <tj@...nel.org>, Zefan Li <lizefan.x@...edance.com>,
Jonathan Corbet <corbet@....net>,
Shuah Khan <shuah@...nel.org>, Yu Zhao <yuzhao@...gle.com>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Wei Xu <weixugc@...gle.com>, Greg Thelen <gthelen@...gle.com>,
Chen Wandun <chenwandun@...wei.com>,
Vaibhav Jain <vaibhav@...ux.ibm.com>,
"Michal Koutný" <mkoutny@...e.com>,
Tim Chen <tim.c.chen@...ux.intel.com>,
Dan Schatzberg <schatzberg.dan@...il.com>,
cgroups@...r.kernel.org, linux-doc@...r.kernel.org,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Linux-MM <linux-mm@...ck.org>, linux-kselftest@...r.kernel.org
Subject: Re: [PATCH v4 4/4] selftests: cgroup: add a selftest for memory.reclaim
On Sat, Apr 23, 2022 at 02:43:13PM -0700, Yosry Ahmed wrote:
[...]
> > > + cg_run_nowait(memcg, alloc_pagecache_50M_noexit, (void *)(long)fd);
> > > + sleep(1);
> >
> > These sleep(1)s do not seem robust. Since kernel keeps the page cache
> > around, you can convert anon to use tmpfs and use simple cg_run to
> > trigger the allocations of anon (tmpfs) and file which will remain in
> > memory even after return from cg_run.
>
> Other tests in the file are also using sleep approach (see
> test_memcg_min, although it retries for multiple times until
> memory.current reaches an expected amount). In my experience it hasn't
> been flaky running for multiple times on different machines, but I
> agree it can be flaky (false negative).
>
If other tests are doing the same then ignore this comment for now.
There should be a separate effort to move towards more deterministic
approach for the tests instead of sleep().
> I am not sure about the allocating file pages with cg_run, is it
> guaranteed that the page cache will remain in memory until the test
> ends? If it doesn't, it can also flake, but it would produce false
> positives (the test could pass because the kernel drained page cache
> for some other reason although the interface is not working
> correctly).
>
> In my personal opinion, false negative flakes are better than false
> positives. At least currently the test explicitly and clearly fails if
> the allocations are not successful. If we rely on the page cache
> remaining until the test finishes then it could silently pass if the
> interface is not working correctly.
>
> There are a few ways we can go forward with this:
> 1) Keep everything as-is, but print a message if the test fails due to
> memory.current not reaching 100MB to make it clear that it didn't fail
> due to a problem with the interface.
> 2) Add a sleep/retry loop similar to test_memcg_min instead of sleeping once.
> 3) Send a signal from forked children when they are done with the
> allocation, and wait to receive this signal in the test to make sure
> the allocation is completed.
>
> In my opinion we should do (1) (and maybe (2)) for now as (3) could be
> an overkill if the test is normal passing. Maybe add a comment about
> (3) being an option in the future if the test flakes. Let me know what
> you think?
I am ok with (1).
Powered by blists - more mailing lists