linux-kernel - Re: [PATCH v2] selftests/cgroup: improve the accuracy of cpu.max tests

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <l3sal6zkvo4lqnfs6fepxytnrmqmqwfvtxudnjm53oigtuatpd@7czfeursgwyh>
Date: Thu, 3 Jul 2025 17:58:48 +0200
From: Michal Koutný <mkoutny@...e.com>
To: Shashank Balaji <shashank.mahadasyam@...y.com>
Cc: Tejun Heo <tj@...nel.org>, Johannes Weiner <hannes@...xchg.org>, 
	Shuah Khan <shuah@...nel.org>, cgroups@...r.kernel.org, linux-kselftest@...r.kernel.org, 
	linux-kernel@...r.kernel.org, Shinya Takumi <shinya.takumi@...y.com>
Subject: Re: [PATCH v2] selftests/cgroup: improve the accuracy of cpu.max
 tests

On Thu, Jul 03, 2025 at 09:03:20PM +0900, Shashank Balaji <shashank.mahadasyam@...y.com> wrote:
> Current cpu.max tests (both the normal one and the nested one) are inaccurate.
> 
> They setup cpu.max with 1000 us quota and the default period (100,000 us).
> A cpu hog is run for a duration of 1s as per wall clock time. This corresponds
> to 10 periods, hence an expected usage of 10,000 us. We want the measured
> usage (as per cpu.stat) to be close to 10,000 us.
> 
> Previously, this approximate equality test was done by
> `!values_close(usage_usec, duration_usec, 95)`: if the absolute
> difference between usage_usec and duration_usec is greater than 95% of
> their sum, then we pass. This is problematic for two reasons:
> 
> 1. Semantics: When one sees `values_close` they expect the error
>    percentage to be some small number, not 95. The intent behind using
> `values_close` is lost by using a high error percent such as 95. The
> intent it's actually going for is "values far".
> 
> 2. Bound too wide: The condition translates to the following expression:
> 
> 	|usage_usec - duration_usec| > (usage_usec + duration_usec)*0.95
> 
>   	0.05*duration_usec > 1.95*usage_usec (usage < duration)
> 
> 	usage_usec < 0.0257*duration_usec = 25,641 us
> 
>    So, this condition passes as long as usage_usec is lower than 25,641
> us, while all we want is for it to be close to 10,000 us.
> 
> Fix this by explicitly calcuating the expected usage duration based on the
> configured quota, default period, and the duration, and compare usage_usec
> and expected_usage_usec using values_close() with a 10% error margin.
> 
> Also, use snprintf to get the quota string to write to cpu.max instead of
> hardcoding the quota, ensuring a single source of truth.
> 
> Signed-off-by: Shashank Balaji <shashank.mahadasyam@...y.com>
> 
> ---
> 
> Changes in v2:
> - Incorporate Michal's suggestions:
> 	- Merge two patches into one
> 	- Generate the quota string from the variable instead of hardcoding it
> 	- Use values_close() instead of labs()
> 	- Explicitly calculate expected_usage_usec
> - v1: https://lore.kernel.org/all/20250701-kselftest-cgroup-fix-cpu-max-v1-0-049507ad6832@sony.com/
> ---
>  tools/testing/selftests/cgroup/test_cpu.c | 63 ++++++++++++++++-------
>  1 file changed, 43 insertions(+), 20 deletions(-)


> -	user_usec = cg_read_key_long(cpucg, "cpu.stat", "user_usec");
> -	if (user_usec <= 0)
> +	if (usage_usec <= 0)
>  		goto cleanup;
>  
> -	if (user_usec >= expected_usage_usec)
> -		goto cleanup;

I think this was a meaningful check. Not sure if dropped accidentally or
on purpose w/out explanation.

After that's addressed, feel free to add
Acked-by: Michal Koutný <mkoutny@...e.com>


Download attachment "signature.asc" of type "application/pgp-signature" (229 bytes)