lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ZSEHcoakN1FeL6ZM@xsang-OptiPlex-9020>
Date: Sat, 7 Oct 2023 15:23:30 +0800
From: Oliver Sang <oliver.sang@...el.com>
To: Ido Schimmel <idosch@...sch.org>
CC: Sriram Yagnaraman <sriram.yagnaraman@....tech>, "oe-lkp@...ts.linux.dev"
	<oe-lkp@...ts.linux.dev>, "lkp@...el.com" <lkp@...el.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, "David S.
 Miller" <davem@...emloft.net>, "netdev@...r.kernel.org"
	<netdev@...r.kernel.org>, <oliver.sang@...el.com>
Subject: Re: [linus:master] [selftests]  8ae9efb859:
 kernel-selftests.net.fib_tests.sh.fail

hi, Ido Schimmel,

On Sun, Oct 01, 2023 at 05:50:20PM +0300, Ido Schimmel wrote:
> On Mon, Sep 25, 2023 at 06:18:34PM +0000, Sriram Yagnaraman wrote:
> > CC: Ido, who helped a lot with writing these tests.
> > 
> > > -----Original Message-----
> > > From: kernel test robot <oliver.sang@...el.com>
> > > Sent: Tuesday, 19 September 2023 10:32
> > > To: Sriram Yagnaraman <sriram.yagnaraman@....tech>
> > > Cc: oe-lkp@...ts.linux.dev; lkp@...el.com; linux-kernel@...r.kernel.org; David
> > > S. Miller <davem@...emloft.net>; netdev@...r.kernel.org;
> > > oliver.sang@...el.com
> > > Subject: [linus:master] [selftests] 8ae9efb859: kernel-
> > > selftests.net.fib_tests.sh.fail
> > > 
> > > 
> > > hi, Sriram Yagnaraman,
> > > 
> > > we noticed two new added tests failed in our test environment.
> > > want to consult with you what's the dependency and requirement to run
> > > them?
> > > Thanks a lot!
> > 
> > Sorry for the delayed response. I will look at this and get back.
> > I am not an expert with lkp-tests but will try to set it up on my local environment and reproduce the problem.
> > 
> > > 
> > > Hello,
> > > 
> > > kernel test robot noticed "kernel-selftests.net.fib_tests.sh.fail" on:
> > > 
> > > commit: 8ae9efb859c05a54ac92b3336c6ca0597c9c8cdb ("selftests: fib_tests:
> > > Add multipath list receive tests")
> > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> > > 
> > > in testcase: kernel-selftests
> > > version: kernel-selftests-x86_64-60acb023-1_20230329
> > > with following parameters:
> > > 
> > > 	group: net
> > > 
> > > 
> > > 
> > > compiler: gcc-12
> > > test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @
> > > 3.00GHz (Cascade Lake) with 32G memory
> > > 
> > > (please refer to attached dmesg/kmsg for entire log/backtrace)
> > > 
> > > 
> > > 
> > > 
> > > If you fix the issue in a separate patch/commit (i.e. not just a new version of the
> > > same patch/commit), kindly add following tags
> > > | Reported-by: kernel test robot <oliver.sang@...el.com>
> > > | Closes:
> > > | https://lore.kernel.org/oe-lkp/202309191658.c00d8b8-oliver.sang@intel.
> > > | com
> > > 
> > > 
> > > 
> > > # timeout set to 1500
> > > # selftests: net: fib_tests.sh
> > > #
> > > # Single path route test
> > > #     Start point
> > > #     TEST: IPv4 fibmatch                                                 [ OK ]
> > > #     TEST: IPv6 fibmatch                                                 [ OK ]
> > > #     Nexthop device deleted
> > > #     TEST: IPv4 fibmatch - no route                                      [ OK ]
> > > #     TEST: IPv6 fibmatch - no route                                      [ OK ]
> > > 
> > > ...
> > > 
> > > #
> > > # Fib6 garbage collection test
> > > #     TEST: ipv6 route garbage collection                                 [ OK ]
> > > #
> > > # IPv4 multipath list receive tests
> > > #     TEST: Multipath route hit ratio (.06)                               [FAIL]
> > > #
> > > # IPv6 multipath list receive tests
> > > #     TEST: Multipath route hit ratio (.10)                               [FAIL]
> 
> I found two possible problems. The first is that in the IPv4 case we
> might get more trace point hits than packets (ratio higher than 1)
> because of the additional FIB lookups for source validation. Fixed by
> disabling source validation:
> 
> diff --git a/tools/testing/selftests/net/fib_tests.sh b/tools/testing/selftests/net/fib_tests.sh
> index e7d2a530618a..66d0db7a2614 100755
> --- a/tools/testing/selftests/net/fib_tests.sh
> +++ b/tools/testing/selftests/net/fib_tests.sh
> @@ -2437,6 +2437,9 @@ ipv4_mpath_list_test()
>         run_cmd "ip -n ns2 route add 203.0.113.0/24
>                 nexthop via 172.16.201.2 nexthop via 172.16.202.2"
>         run_cmd "ip netns exec ns2 sysctl -qw net.ipv4.fib_multipath_hash_policy=1"
> +       run_cmd "ip netns exec ns2 sysctl -qw net.ipv4.conf.veth2.rp_filter=0"
> +       run_cmd "ip netns exec ns2 sysctl -qw net.ipv4.conf.all.rp_filter=0"
> +       run_cmd "ip netns exec ns2 sysctl -qw net.ipv4.conf.default.rp_filter=0"
>         set +e
>  
>         local dmac=$(ip -n ns2 -j link show dev veth2 | jq -r '.[]["address"]')
> 
> The second problem (which I believe is the one you encountered) is that
> we might miss certain trace point hits if they happen from the ksoftirqd
> task instead of the mausezahn task. Fixed by:
> 
> @@ -2449,7 +2452,7 @@ ipv4_mpath_list_test()
>         # words, the FIB lookup tracepoint needs to be triggered for every
>         # packet.
>         local t0_rx_pkts=$(link_stats_get ns2 veth2 rx packets)
> -       run_cmd "perf stat -e fib:fib_table_lookup --filter 'err == 0' -j -o $tmp_file -- $cmd"
> +       run_cmd "perf stat -a -e fib:fib_table_lookup --filter 'err == 0' -j -o $tmp_file -- $cmd"
>         local t1_rx_pkts=$(link_stats_get ns2 veth2 rx packets)
>         local diff=$(echo $t1_rx_pkts - $t0_rx_pkts | bc -l)
>         list_rcv_eval $tmp_file $diff
> @@ -2494,7 +2497,7 @@ ipv6_mpath_list_test()
>         # words, the FIB lookup tracepoint needs to be triggered for every
>         # packet.
>         local t0_rx_pkts=$(link_stats_get ns2 veth2 rx packets)
> -       run_cmd "perf stat -e fib6:fib6_table_lookup --filter 'err == 0' -j -o $tmp_file -- $cmd"
> +       run_cmd "perf stat -a -e fib6:fib6_table_lookup --filter 'err == 0' -j -o $tmp_file -- $cmd"
>         local t1_rx_pkts=$(link_stats_get ns2 veth2 rx packets)
>         local diff=$(echo $t1_rx_pkts - $t0_rx_pkts | bc -l)
>         list_rcv_eval $tmp_file $diff
> 
> Ran both tests in a loop:
> 
> # for i in $(seq 1 20); do ./fib_tests.sh -t ipv4_mpath_list; done
> # for i in $(seq 1 20); do ./fib_tests.sh -t ipv6_mpath_list; done
> 
> And verified that the results are stable. Also verified that the tests
> reliably fail when reverting both fixes:
> 
> 8423be8926aa ipv6: ignore dst hint for multipath routes
> 6ac66cb03ae3 ipv4: ignore dst hint for multipath routes
> 
> Can you please test with the proposed modifications?

we applied above patches upon 8ae9efb859, and two tests passed now:

# IPv4 multipath list receive tests
#     TEST: Multipath route hit ratio (.99)                               [ OK ]
#
# IPv6 multipath list receive tests
#     TEST: Multipath route hit ratio (1.00)                              [ OK ]
#
# Tests passed: 225
# Tests failed:   0
ok 17 selftests: net: fib_tests.sh


Tested-by: kernel test robot <oliver.sang@...el.com>


> 
> Thanks
> 

Powered by blists - more mailing lists