netdev - Re: [PATCH net-next 1/2] selftests: drv-net: rss

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250901135008.GC15473@horms.kernel.org>
Date: Mon, 1 Sep 2025 14:50:08 +0100
From: Simon Horman <horms@...nel.org>
To: Jakub Kicinski <kuba@...nel.org>
Cc: davem@...emloft.net, netdev@...r.kernel.org, edumazet@...gle.com,
	pabeni@...hat.com, andrew+netdev@...n.ch, ecree.xilinx@...il.com,
	gal@...dia.com, joe@...a.to, linux-kselftest@...r.kernel.org,
	shuah@...nel.org
Subject: Re: [PATCH net-next 1/2] selftests: drv-net: rss_ctx: use Netlink
 for timed reconfig

On Fri, Aug 29, 2025 at 03:07:11PM -0700, Jakub Kicinski wrote:
> The rss_ctx test has gotten pretty flaky after I increased
> the queue count in NIPA 2->3. Not 100% clear why. We get
> a lot of failures in the rss_ctx.test_hitless_key_update case.
> 
> Looking closer it appears that the failures are mostly due
> to startup costs. I measured the following timing for ethtool -X:
>  - python cmd(shell=True)  : 150-250msec
>  - python cmd(shell=False) :  50- 70msec
>  - timed in bash           :  45- 55msec
>  - YNL Netlink call        :   2-  4msec
>  - .set_rxfh callback      :   1-  2msec
> 
> The target in the test was set to 200msec. We were mostly measuring
> ethtool startup cost it seems. Switch to YNL since it's 100x faster.
> 
> Lower the pass criteria to ~75msec, no real science behind this number
> but we removed ~150msec of overhead, and the old target was 200msec.
> So any driver that was passing previously should still pass with 75msec.
> 
> Separately we should probably follow up on defaulting to shell=False,
> when script doesn't explicitly ask for True, because the overhead
> is rather significant.

+1

> 
> Signed-off-by: Jakub Kicinski <kuba@...nel.org>
> ---
>  tools/testing/selftests/drivers/net/hw/rss_ctx.py | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/testing/selftests/drivers/net/hw/rss_ctx.py b/tools/testing/selftests/drivers/net/hw/rss_ctx.py
> index 9838b8457e5a..3fc5688605b5 100755
> --- a/tools/testing/selftests/drivers/net/hw/rss_ctx.py
> +++ b/tools/testing/selftests/drivers/net/hw/rss_ctx.py
> @@ -335,19 +335,20 @@ from lib.py import ethtool, ip, defer, GenerateTraffic, CmdExitFailure
>      data = get_rss(cfg)
>      key_len = len(data['rss-hash-key'])
>  
> -    key = _rss_key_rand(key_len)
> +    ethnl = EthtoolFamily()
> +    key = random.randbytes(key_len)

Is the update to the generation of key intended?
It's not clear to me how it relates to the rest of the patch.

>  
>      tgen = GenerateTraffic(cfg)
>      try:
>          errors0, carrier0 = get_drop_err_sum(cfg)
>          t0 = datetime.datetime.now()
> -        ethtool(f"-X {cfg.ifname} hkey " + _rss_key_str(key))
> +        ethnl.rss_set({"header": {"dev-index": cfg.ifindex}, "hkey": key})
>          t1 = datetime.datetime.now()
>          errors1, carrier1 = get_drop_err_sum(cfg)
>      finally:
>          tgen.wait_pkts_and_stop(5000)
>  
> -    ksft_lt((t1 - t0).total_seconds(), 0.2)
> +    ksft_lt((t1 - t0).total_seconds(), 0.075)
>      ksft_eq(errors1 - errors1, 0)
>      ksft_eq(carrier1 - carrier0, 0)
>  
> -- 
> 2.51.0
>