[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <exfgzrx7u3s77gsoxqzm4zhb6mr7aysc2vzus5ob3zeadkm7ut@3dzkywk3jfqr>
Date: Tue, 4 Mar 2025 15:20:58 +0100
From: Michal Koutný <mkoutny@...e.com>
To: Naresh Kamboju <naresh.kamboju@...aro.org>
Cc: Cgroups <cgroups@...r.kernel.org>, linux-mm <linux-mm@...ck.org>,
"open list:KERNEL SELFTEST FRAMEWORK" <linux-kselftest@...r.kernel.org>, open list <linux-kernel@...r.kernel.org>,
Shuah Khan <shuah@...nel.org>, Dan Carpenter <dan.carpenter@...aro.org>,
Arnd Bergmann <arnd@...db.de>, Anders Roxell <anders.roxell@...aro.org>,
Tejun Heo <tj@...nel.org>, Johannes Weiner <hannes@...xchg.org>
Subject: Re: selftests: cgroup: Failures – Timeouts & OOM Issues Analysis
Hello Naresh.
On Tue, Mar 04, 2025 at 05:26:45PM +0530, Naresh Kamboju <naresh.kamboju@...aro.org> wrote:
> As part of LKFT’s re-validation of known issues, we have observed that
> the selftests: cgroup suite is consistently failing across almost all
> LKFT-supported devices due to:
> - Test timeouts (45 seconds limit reached)
> - OOM-killer invocation
Thanks for reporting the issues with the tests.
> ## Key Questions for Discussion:
> - Would it be beneficial to increase the test timeout to ~180 seconds
> to allow sufficient execution time?
That depends.
test_cpu has some lenghtier checks and they can in sum surpass 45s,
it'd might be better to shorten them (withing precision margin) instead
of prolonging the limit.
test_kmem -- it shouldn't take so long, if anything I'd suspect
/proc/kpagecgroup -- are your systems larger than 100GiB of memory
(that's my rough estimate for this reads to take above the limit)?
(Are there any other timeouts?)
OOM -- some tests are supposed to trigger memcg OOM.
> - Should we enhance logging to explicitly print failure reasons when a
> test fails?
These tests are useful when run by developers them_selves_. In such a
case it's handy to obtain more info running them understrace (since
they're so simple).
> - Are there any missing dependencies that could be causing these failures?
> Note: The required selftests/cgroup/config options were included in
> LKFT's build and test plans.
The deps are rather minimal, only some coreutils (cgroup selftests
should be covered by e.g. this list [1]).
>
> ## Devices Affected:
> The following DUTs consistently experience these failures:
> - dragonboard-410c (arm64)
> - dragonboard-845c (arm64)
> - e850-96 (arm64)
> - juno-r2 (arm64)
> - qemu-arm64 (arm64)
> - qemu-armv7
> - qemu-x86_64
> - rk3399-rock-pi-4b (arm64)
> - x15 (arm)
> - x86_64
>
> Regression Analysis:
> - New regression? No (these failures have been observed for months/years).
Actually, I noticed test_memcontrol failure yesterday (with ~mainline
kernel) but I remember they used to work also rather recently. I haven't
got time to look into that but at least that one may be a regression (in
code or test).
> - Reproducibility? Yes, the failures occur consistently.
+/- as that may depend no nr_cpus or totalram.
> - Test suite affected? selftests: cgroup (timeouts and OOM-related failures).
Michal
Download attachment "signature.asc" of type "application/pgp-signature" (229 bytes)
Powered by blists - more mailing lists