lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240823080253.1c11c028@kernel.org>
Date: Fri, 23 Aug 2024 08:02:53 -0700
From: Jakub Kicinski <kuba@...nel.org>
To: Petr Machata <petrm@...dia.com>
Cc: Nikolay Aleksandrov <razor@...ckwall.org>, Hangbin Liu
 <liuhangbin@...il.com>, "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: [TEST] forwarding/router_bridge_lag.sh started to flake on
 Monday

On Fri, 23 Aug 2024 13:28:11 +0200 Petr Machata wrote:
> Jakub Kicinski <kuba@...nel.org> writes:
> 
> > Looks like forwarding/router_bridge_lag.sh has gotten a lot more flaky
> > this week. It flaked very occasionally (and in a different way) before:
> >
> > https://netdev.bots.linux.dev/contest.html?executor=vmksft-forwarding&test=router-bridge-lag-sh&ld_cnt=250
> >
> > There doesn't seem to be any obvious commit that could have caused this.  
> 
> Hmm:
>     # 3.37 [+0.11] Error: Device is up. Set it down before adding it as a team port.
> 
> How are the tests isolated, are they each run in their own vng, or are
> instances shared? Could it be that the test that runs befor this one
> neglects to take a port down?

Yes, each one has its own VM, but the VM is reused for multiple tests
serially. The "info" file shows which VM was use (thr-id identifies
the worker, vm-id identifies VM within the worker, worker will restart
the VM if it detects a crash).

> In one failure case (I don't see further back or my browser would
> apparently catch fire) the predecessor was no_forwarding.sh, and indeed
> it looks like it raises the ports, but I don't see where it sets them
> back down.
> 
> Then router-bridge-lag's cleanup downs the ports, and on rerun it
> succeeds. The issue would be probabilistic, because no_forwarding does
> not always run before this test, and some tests do not care that the
> ports are up. If that's the root cause, this should fix it:
> 
> From 0baf91dc24b95ae0cadfdf5db05b74888e6a228a Mon Sep 17 00:00:00 2001
> Message-ID: <0baf91dc24b95ae0cadfdf5db05b74888e6a228a.1724413545.git.petrm@...dia.com>
> From: Petr Machata <petrm@...dia.com>
> Date: Fri, 23 Aug 2024 14:42:48 +0300
> Subject: [PATCH net-next mlxsw] selftests: forwarding: no_forwarding: Down
>  ports on cleanup
> To: <nbu-linux-internal@...dia.com>
> 
> This test neglects to put ports down on cleanup. Fix it.
> 
> Fixes: 476a4f05d9b8 ("selftests: forwarding: add a no_forwarding.sh test")
> Signed-off-by: Petr Machata <petrm@...dia.com>
> ---
>  tools/testing/selftests/net/forwarding/no_forwarding.sh | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/tools/testing/selftests/net/forwarding/no_forwarding.sh b/tools/testing/selftests/net/forwarding/no_forwarding.sh
> index af3b398d13f0..9e677aa64a06 100755
> --- a/tools/testing/selftests/net/forwarding/no_forwarding.sh
> +++ b/tools/testing/selftests/net/forwarding/no_forwarding.sh
> @@ -233,6 +233,9 @@ cleanup()
>  {
>  	pre_cleanup
>  
> +	ip link set dev $swp2 down
> +	ip link set dev $swp1 down
> +
>  	h2_destroy
>  	h1_destroy
>  

no_forwarding always runs in thread 0 because it's the slowest tests
and we try to run from the slowest as a basic bin packing heuristic.
Clicking thru the failures I don't see them on thread 0.

But putting the ports down seems like a good cleanup regardless.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ