[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Za8439kp8oPxwb7M@Laptop-X1>
Date: Tue, 23 Jan 2024 11:56:15 +0800
From: Hangbin Liu <liuhangbin@...il.com>
To: Jay Vosburgh <jay.vosburgh@...onical.com>
Cc: Jakub Kicinski <kuba@...nel.org>,
Benjamin Poirier <bpoirier@...dia.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: [TEST] bond_options.sh looks flaky
On Mon, Jan 22, 2024 at 03:25:57PM -0800, Jay Vosburgh wrote:
> Jakub Kicinski <kuba@...nel.org> wrote:
>
> >Hi folks,
> >
> >looks like tools/testing/selftests/drivers/net/bonding/bond_options.sh
> >is a bit flaky. This error:
> >
> ># TEST: prio (balance-alb arp_ip_target primary_reselect 1) [FAIL]
> ># Current active slave is eth2 but not eth1
> >
> >https://netdev-2.bots.linux.dev/vmksft-bonding/results/432442/7-bond-options-sh
> >
> >was gone on the next run, even tho the only difference between
> >the content of the tree was:
> >
> >$ git diff net-next-2024-01-22--18-00..net-next-2024-01-22--21-00 --stat
> > Documentation/devicetree/bindings/net/adi,adin.yaml | 7 ++-----
> > drivers/net/dsa/mv88e6xxx/chip.c | 2 +-
> > drivers/net/phy/adin.c | 2 --
> > 3 files changed, 3 insertions(+), 8 deletions(-)
> >
> >So definitely nothing of relevance..
> >
> >Any ideas?
>
> I think I see a couple of things in the test logic:
>
> 1) in bond_options.sh:
>
> prio_arp()
> {
> local primary_reselect
> local mode=$1
>
> for primary_reselect in 0 1 2; do
> prio_test "mode active-backup arp_interval 100 arp_ip_target ${g_ip4} primary eth1 primary_reselect $primary_reselect"
> log_test "prio" "$mode arp_ip_target primary_reselect $primary_reselect"
> done
> }
>
> The above appears to always test with "mode active-backup"
> regardless of what $mode contains, but logs that $mode was tested. The
> same is true for the prio_ns test that is just after prio_arp in
> bond_options.sh.
Ah, yes. I will post a fix for this issue.
>
> 2) The balance-alb and balance-tlb modes don't work with the ARP
> monitor. If the prio_arp or prio_ns tests were actually testing the
> stated $mode with arp_interval, it should never succeed.
Hmm, I forgot why I put the prio_arp/prio_ns in the mode for loop but
only use active-backup for testing... But this definitely a waste of time.
I will run them only for active-backup testing.
>
> 3) I'm not sure why this test fails, but the prior test that claims to
> be active-backup does not, even though both appear to be actually
> testing active-backup. The log entries for the actual "prio
> (active-backup arp_ip_target primary_reselect 1)" test start at time
> 281.913374, and differ from the failing test starting at 715.597039.
>From the passed log
[ 505.516927] br0: port 2(s1) entered disabled state
[ 505.773009] bond0: (slave eth1): link status definitely down, disabling slave
[ 505.773593] bond0: (slave eth2): making interface the new active one
While the failed log
[ 723.603062] br0: port 4(s2) entered disabled state
[ 723.868750] bond0: (slave eth2): link status definitely down, disabling slave
[ 723.869104] bond0: (slave eth1): making interface the new active one
It looks the wrong active link was set. It should be eth1 but set to eth2.
So the later link operation set eth2 link down. Not sure why eth2 was set to
active interface. I need to print log immediately if check_err failed.
Thanks
Hangbin
Powered by blists - more mailing lists