lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zahtwx2NmxOyko4p@d3>
Date: Wed, 17 Jan 2024 19:16:03 -0500
From: Benjamin Poirier <bpoirier@...dia.com>
To: Hangbin Liu <liuhangbin@...il.com>
Cc: Jakub Kicinski <kuba@...nel.org>,
	Jay Vosburgh <jay.vosburgh@...onical.com>, netdev@...r.kernel.org,
	Andy Gospodarek <andy@...yhouse.net>, Shuah Khan <shuah@...nel.org>,
	"David S. Miller" <davem@...emloft.net>,
	Jonathan Toppins <jon.toppins+linux@...il.com>,
	Nikolay Aleksandrov <razor@...ckwall.org>,
	Michal Kubiak <michal.kubiak@...el.com>,
	linux-kselftest@...r.kernel.org
Subject: Re: [PATCH net] selftests: bonding: Add more missing config options

On 2024-01-17 11:15 +0800, Hangbin Liu wrote:
> On Tue, Jan 16, 2024 at 02:47:46PM -0500, Benjamin Poirier wrote:
> > On 2024-01-16 11:29 -0800, Jakub Kicinski wrote:
> > > On Tue, 16 Jan 2024 14:21:51 -0500 Benjamin Poirier wrote:
> > > > real    13m35.065s
> > > > user    0m1.657s
> > > > sys     0m27.918s
> > > > 
> > > > The test is not cpu bound; as Jay pointed out, it spends most of its
> > > > time sleeping.
> > > 
> > > Ugh, so it does multiple iterations of 118 sec?
> > 
> > There are other test functions in the script which include a lot of
> > sleeping.
> 
> The arp_validate_test need to check the mii_status, which sleep too much time.
> Maybe we can use busywait to save more time.
> 
> > 
> > > Could you send a patch to bump the timeout to 900 or 1200 in this case?
> > 
> > Sure but I'd like to give a chance for Hangbin to reply first. Would the
> > test be just as good if it was shortened by removing some cases or
> > reducing the time intervals? Or is increasing the timeout the best
> > approach?
> 
> The purpose of grat_arp is testing commit 9949e2efb54e ("bonding: fix
> send_peer_notif overflow"). As the send_peer_notif was defined to u8,
> to overflow it, we need to
> 
> send_peer_notif = num_peer_notif * peer_notif_delay = num_grat_arp * peer_notify_delay / miimon > 255
>   (kernel)           (kernel parameter)                   (user parameter)
> 
> e.g. 30 (num_grat_arp) * 1000 (peer_notify_delay) / 100 (miimon) > 255.
> 
> Which need 30s to complete sending garp messages. To save the testing time,
> the only way is reduce the miimon number. Something like
> 30 (num_grat_arp) * 500 (peer_notify_delay) / 50 (miimon) > 255.
> 
> To save more time, we can remove the 50 num_grat_arp testing. The patch would
> like
> 
> diff --git a/tools/testing/selftests/drivers/net/bonding/bond_options.sh b/tools/testing/selftests/drivers/net/bonding/bond_options.sh
> index c54d1697f439..20c4d862c436 100755
> --- a/tools/testing/selftests/drivers/net/bonding/bond_options.sh
> +++ b/tools/testing/selftests/drivers/net/bonding/bond_options.sh
> @@ -277,7 +277,7 @@ garp_test()
>         ip -n ${s_ns} link set ${active_slave} down
> 
>         exp_num=$(echo "${param}" | cut -f6 -d ' ')
> -       sleep $((exp_num + 2))
> +       sleep $((exp_num / 2 + 2))
> 
>         active_slave=$(cmd_jq "ip -n ${s_ns} -d -j link show bond0" ".[].linkinfo.info_data.active_slave")
> 
> @@ -296,8 +296,8 @@ garp_test()
>  num_grat_arp()
>  {
>         local val
> -       for val in 10 20 30 50; do
> -               garp_test "mode active-backup miimon 100 num_grat_arp $val peer_notify_delay 1000"
> +       for val in 10 20 30; do
> +               garp_test "mode active-backup miimon 50 num_grat_arp $val peer_notify_delay 500"
>                 log_test "num_grat_arp" "active-backup miimon num_grat_arp $val"
>         done
>  }
> 
> With this we can save 100s.
> 

Thanks for looking into it. This change got the runtime down from 13m35s
to 12m7s on the same system I used to test yesterday. That's a start but
since it's still well above the current timeout of 120s, I sent a patch
to increase the timeout to 1200s.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ