lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d337ab36-0fa1-c438-f15e-65f20045aa81@alu.unizg.hr>
Date:   Sat, 22 Jul 2023 02:54:44 +0200
From:   Mirsad Todorovac <mirsad.todorovac@....unizg.hr>
To:     Petr Machata <petrm@...dia.com>
Cc:     netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
        "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>, Shuah Khan <shuah@...nel.org>,
        linux-kselftest@...r.kernel.org, Ido Schimmel <idosch@...dia.com>
Subject: Re: [PROBLEM] [ADDITIONAL DIAG] seltests: net/forwarding/sch_ets.sh
 [HANG]

On 7/22/23 02:31, Mirsad Todorovac wrote:
> On 7/20/23 18:25, Mirsad Todorovac wrote:
>> On 7/20/23 18:07, Mirsad Todorovac wrote:
>>> On 7/20/23 11:43, Petr Machata wrote:
>>>>
>>>> Mirsad Todorovac <mirsad.todorovac@....unizg.hr> writes:
>>>>
>>>>> Using the same config for 6.5-rc2 on Ubuntu 22.04 LTS and 22.10, the execution
>>>>> stop at the exact same line on both boxes (os I reckon it is more than an
>>>>> accident):
>>>>>
>>>>> # selftests: net/forwarding: sch_ets.sh
>>>>> # TEST: ping vlan 10                                                  [ OK ]
>>>>> # TEST: ping vlan 11                                                  [ OK ]
>>>>> # TEST: ping vlan 12                                                  [ OK ]
>>>>> # Running in priomap mode
>>>>> # Testing ets bands 3 strict 3, streams 0 1
>>>>> # TEST: band 0                                                        [ OK ]
>>>>> # INFO: Expected ratio >95% Measured ratio 100.00
>>>>> # TEST: band 1                                                        [ OK ]
>>>>> # INFO: Expected ratio <5% Measured ratio 0
>>>>> # Testing ets bands 3 strict 3, streams 1 2
>>>>> # TEST: band 1                                                        [ OK ]
>>>>> # INFO: Expected ratio >95% Measured ratio 100.00
>>>>> # TEST: band 2                                                        [ OK ]
>>>>> # INFO: Expected ratio <5% Measured ratio 0
>>>>> # Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 0 1
>>>>> # TEST: band 0                                                        [ OK ]
>>>>> # INFO: Expected ratio >95% Measured ratio 100.00
>>>>> # TEST: band 1                                                        [ OK ]
>>>>> # INFO: Expected ratio <5% Measured ratio 0
>>>>> # Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 1 2
>>>>> # TEST: bands 1:2                                                     [ OK ]
>>>>> # INFO: Expected ratio 2.00 Measured ratio 1.99
>>>>> # Testing ets bands 3 quanta 3300 3300 3300, streams 0 1 2
>>>>> # TEST: bands 0:1                                                     [ OK ]
>>>>> # INFO: Expected ratio 1.00 Measured ratio .99
>>>>> # TEST: bands 0:2                                                     [ OK ]
>>>>> # INFO: Expected ratio 1.00 Measured ratio 1.00
>>>>> # Testing ets bands 3 quanta 5000 3500 1500, streams 0 1 2
>>>>> # TEST: bands 0:1                                                     [ OK ]
>>>>> # INFO: Expected ratio 1.42 Measured ratio 1.42
>>>>> # TEST: bands 0:2                                                     [ OK ]
>>>>> # INFO: Expected ratio 3.33 Measured ratio 3.33
>>>>> # Testing ets bands 3 quanta 5000 8000 1500, streams 0 1 2
>>>>> # TEST: bands 0:1                                                     [ OK ]
>>>>> # INFO: Expected ratio 1.60 Measured ratio 1.59
>>>>> # TEST: bands 0:2                                                     [ OK ]
>>>>> # INFO: Expected ratio 3.33 Measured ratio 3.33
>>>>> # Testing ets bands 2 quanta 5000 2500, streams 0 1
>>>>> # TEST: bands 0:1                                                     [ OK ]
>>>>> # INFO: Expected ratio 2.00 Measured ratio 1.99
>>>>> # Running in classifier mode
>>>>> # Testing ets bands 3 strict 3, streams 0 1
>>>>> # TEST: band 0                                                        [ OK ]
>>>>> # INFO: Expected ratio >95% Measured ratio 100.00
>>>>> # TEST: band 1                                                        [ OK ]
>>>>> # INFO: Expected ratio <5% Measured ratio 0
>>>>> # Testing ets bands 3 strict 3, streams 1 2
>>>>> # TEST: band 1                                                        [ OK ]
>>>>> # INFO: Expected ratio >95% Measured ratio 100.00
>>>>> # TEST: band 2                                                        [ OK ]
>>>>> # INFO: Expected ratio <5% Measured ratio 0
>>>>> # Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 0 1
>>>>>
>>>>> I tried to run 'set -x' enabled version standalone, but that one finished
>>>>> correctly (?).
>>>>>
>>>>> It could be something previous scripts left, but right now I don't have a clue.
>>>>> I can attempt to rerun all tests with sch_ets.sh bash 'set -x' enabled later today.
>>>>
>>>> If you run it standalone without set -x, does it finish as well?
>>>
>>> Added that. Yes, standlone run finishes correctly, with or without 'set -x':
>>>
>>> root@...iant:/home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/net/forwarding# ./sch_ets.sh
>>> TEST: ping vlan 10                                                  [ OK ]
>>> TEST: ping vlan 11                                                  [ OK ]
>>> TEST: ping vlan 12                                                  [ OK ]
>>> Running in priomap mode
>>> Testing ets bands 3 strict 3, streams 0 1
>>> TEST: band 0                                                        [ OK ]
>>> INFO: Expected ratio >95% Measured ratio 100.00
>>> TEST: band 1                                                        [ OK ]
>>> INFO: Expected ratio <5% Measured ratio 0
>>> killing MZ
>>> killed MZ
>>> killing MZ
>>> killed MZ
>>> Testing ets bands 3 strict 3, streams 1 2
>>> TEST: band 1                                                        [ OK ]
>>> INFO: Expected ratio >95% Measured ratio 100.00
>>> TEST: band 2                                                        [ OK ]
>>> INFO: Expected ratio <5% Measured ratio 0
>>> killing MZ
>>> killed MZ
>>> killing MZ
>>> killed MZ
>>> Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 0 1
>>> TEST: band 0                                                        [ OK ]
>>> INFO: Expected ratio >95% Measured ratio 100.00
>>> TEST: band 1                                                        [ OK ]
>>> INFO: Expected ratio <5% Measured ratio 0
>>> killing MZ
>>> killed MZ
>>> killing MZ
>>> killed MZ
>>> Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 1 2
>>> TEST: bands 1:2                                                     [ OK ]
>>> INFO: Expected ratio 2.00 Measured ratio 1.99
>>> killing MZ
>>> killed MZ
>>> killing MZ
>>> killed MZ
>>> Testing ets bands 3 quanta 3300 3300 3300, streams 0 1 2
>>> TEST: bands 0:1                                                     [ OK ]
>>> INFO: Expected ratio 1.00 Measured ratio 1.00
>>> TEST: bands 0:2                                                     [ OK ]
>>> INFO: Expected ratio 1.00 Measured ratio .99
>>> killing MZ
>>> killed MZ
>>> killing MZ
>>> killed MZ
>>> killing MZ
>>> killed MZ
>>> Testing ets bands 3 quanta 5000 3500 1500, streams 0 1 2
>>> TEST: bands 0:1                                                     [ OK ]
>>> INFO: Expected ratio 1.42 Measured ratio 1.42
>>> TEST: bands 0:2                                                     [ OK ]
>>> INFO: Expected ratio 3.33 Measured ratio 3.33
>>> killing MZ
>>> killed MZ
>>> killing MZ
>>> killed MZ
>>> killing MZ
>>> killed MZ
>>> Testing ets bands 3 quanta 5000 8000 1500, streams 0 1 2
>>> TEST: bands 0:1                                                     [ OK ]
>>> INFO: Expected ratio 1.60 Measured ratio 1.59
>>> TEST: bands 0:2                                                     [ OK ]
>>> INFO: Expected ratio 3.33 Measured ratio 3.33
>>> killing MZ
>>> killed MZ
>>> killing MZ
>>> killed MZ
>>> killing MZ
>>> killed MZ
>>> Testing ets bands 2 quanta 5000 2500, streams 0 1
>>> TEST: bands 0:1                                                     [ OK ]
>>> INFO: Expected ratio 2.00 Measured ratio 1.99
>>> killing MZ
>>> killed MZ
>>> killing MZ
>>> killed MZ
>>> Running in classifier mode
>>> Testing ets bands 3 strict 3, streams 0 1
>>> TEST: band 0                                                        [ OK ]
>>> INFO: Expected ratio >95% Measured ratio 100.00
>>> TEST: band 1                                                        [ OK ]
>>> INFO: Expected ratio <5% Measured ratio 0
>>> killing MZ
>>> killed MZ
>>> killing MZ
>>> killed MZ
>>> Testing ets bands 3 strict 3, streams 1 2
>>> TEST: band 1                                                        [ OK ]
>>> INFO: Expected ratio >95% Measured ratio 100.00
>>> TEST: band 2                                                        [ OK ]
>>> INFO: Expected ratio <5% Measured ratio 0
>>> killing MZ
>>> killed MZ
>>> killing MZ
>>> killed MZ
>>> Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 0 1
>>> TEST: band 0                                                        [ OK ]
>>> INFO: Expected ratio >95% Measured ratio 100.00
>>> TEST: band 1                                                        [ OK ]
>>> INFO: Expected ratio <5% Measured ratio 0
>>> killing MZ
>>> killed MZ
>>> killing MZ
>>> killed MZ
>>> Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 1 2
>>> TEST: bands 1:2                                                     [ OK ]
>>> INFO: Expected ratio 2.00 Measured ratio 1.99
>>> killing MZ
>>> killed MZ
>>> killing MZ
>>> killed MZ
>>> Testing ets bands 3 quanta 3300 3300 3300, streams 0 1 2
>>> TEST: bands 0:1                                                     [ OK ]
>>> INFO: Expected ratio 1.00 Measured ratio .99
>>> TEST: bands 0:2                                                     [ OK ]
>>> INFO: Expected ratio 1.00 Measured ratio .99
>>> killing MZ
>>> killed MZ
>>> killing MZ
>>> killed MZ
>>> killing MZ
>>> killed MZ
>>> Testing ets bands 3 quanta 5000 3500 1500, streams 0 1 2
>>> TEST: bands 0:1                                                     [ OK ]
>>> INFO: Expected ratio 1.42 Measured ratio 1.42
>>> TEST: bands 0:2                                                     [ OK ]
>>> INFO: Expected ratio 3.33 Measured ratio 3.33
>>> killing MZ
>>> killed MZ
>>> killing MZ
>>> killed MZ
>>> killing MZ
>>> killed MZ
>>> Testing ets bands 3 quanta 5000 8000 1500, streams 0 1 2
>>> TEST: bands 0:1                                                     [ OK ]
>>> INFO: Expected ratio 1.60 Measured ratio 1.60
>>> TEST: bands 0:2                                                     [ OK ]
>>> INFO: Expected ratio 3.33 Measured ratio 3.33
>>> killing MZ
>>> killed MZ
>>> killing MZ
>>> killed MZ
>>> killing MZ
>>> killed MZ
>>> Testing ets bands 2 quanta 5000 2500, streams 0 1
>>> TEST: bands 0:1                                                     [ OK ]
>>> INFO: Expected ratio 2.00 Measured ratio 2.00
>>> killing MZ
>>> killed MZ
>>> killing MZ
>>> killed MZ
>>> root@...iant:/home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/net/forwarding#
>>>
>>>> That would imply that the reproducer needs to include the previous tests as
>>>> well.
>>>
>>> This is entirely possible, as timeouts and CTRL+C events do not seem to be caught
>>> and the cleanup is not done ...
>>>
>>> sch_ets_core.sh:    trap cleanup EXIT
>>>
>>> Some tests time out even after settings:timeout=240, so IMHO this should be taken into account.
>>>
>>> Best regards,
>>> Mirsad Todorovac
>>>
>>>> It looks like the test is stuck in ets_test_mixed in classifier_mode.
>>>> A way to run just this test would be:
>>>>
>>>>      TESTS="classifier_mode ets_test_mixed" ./sch_ets.sh
>>>>
>>>> Looking at the code, the only place that I can see that waits on
>>>> anything is the "{ kill %% && wait %%; } 2>/dev/null" line in
>>>> stop_traffic() (in lib.sh). Maybe something like this would let
>>>> us see if that's the case:
>>>>
>>>> modified   tools/testing/selftests/net/forwarding/lib.sh
>>>> @@ -1468,8 +1470,10 @@ start_tcp_traffic()
>>>>   stop_traffic()
>>>>   {
>>>> +    echo killing MZ
>>>>       # Suppress noise from killing mausezahn.
>>>>       { kill %% && wait %%; } 2>/dev/null
>>>> +    echo killed MZ
>>>>   }
>>
>> FYI, this is the [incomplete] list of the tests that time out even after
>> being assigned long timeout of 240s instead of default.
>>
>> marvin@...iant:~/linux/kernel/linux_torvalds$ grep TIMEOUT ../kselftest-6.5-rc2-net-forwarding-8.log
>> not ok 13 selftests: net/forwarding: custom_multipath_hash.sh # TIMEOUT 240 seconds
>> not ok 18 selftests: net/forwarding: gre_custom_multipath_hash.sh # TIMEOUT 240 seconds
>> not ok 19 selftests: net/forwarding: gre_inner_v4_multipath.sh # TIMEOUT 240 seconds
>> not ok 21 selftests: net/forwarding: gre_multipath_nh_res.sh # TIMEOUT 240 seconds
>> not ok 22 selftests: net/forwarding: gre_multipath_nh.sh # TIMEOUT 240 seconds
>> not ok 27 selftests: net/forwarding: ip6gre_custom_multipath_hash.sh # TIMEOUT 240 seconds
>> not ok 34 selftests: net/forwarding: ip6gre_inner_v4_multipath.sh # TIMEOUT 240 seconds
>> not ok 58 selftests: net/forwarding: no_forwarding.sh # TIMEOUT 240 seconds
>> not ok 67 selftests: net/forwarding: router_mpath_nh_res.sh # TIMEOUT 240 seconds
>> not ok 68 selftests: net/forwarding: router_mpath_nh.sh # TIMEOUT 240 seconds
>> not ok 70 selftests: net/forwarding: router_multipath.sh # TIMEOUT 240 seconds
>> marvin@...iant:~/linux/kernel/linux_torvalds$
> 
> The test seem to be stuck despite adding SIGTERM cleanup to all tests
> that timed out:
> 
> # selftests: net/forwarding: sch_ets.sh
> # TEST: ping vlan 10                                                  [ OK ]
> # TEST: ping vlan 11                                                  [ OK ]
> # TEST: ping vlan 12                                                  [ OK ]
> # Running in priomap mode
> # Testing ets bands 3 strict 3, streams 0 1
> # TEST: band 0                                                        [ OK ]
> # INFO: Expected ratio >95% Measured ratio 100.00
> # TEST: band 1                                                        [ OK ]
> # INFO: Expected ratio <5% Measured ratio 0
> # Testing ets bands 3 strict 3, streams 1 2
> # TEST: band 1                                                        [ OK ]
> # INFO: Expected ratio >95% Measured ratio 100.00
> # TEST: band 2                                                        [ OK ]
> # INFO: Expected ratio <5% Measured ratio 0
> # Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 0 1
> # TEST: band 0                                                        [ OK ]
> # INFO: Expected ratio >95% Measured ratio 100.00
> # TEST: band 1                                                        [ OK ]
> # INFO: Expected ratio <5% Measured ratio 0
> # Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 1 2
> # TEST: bands 1:2                                                     [ OK ]
> # INFO: Expected ratio 2.00 Measured ratio 2.00
> # Testing ets bands 3 quanta 3300 3300 3300, streams 0 1 2
> # TEST: bands 0:1                                                     [ OK ]
> # INFO: Expected ratio 1.00 Measured ratio .99
> # TEST: bands 0:2                                                     [ OK ]
> # INFO: Expected ratio 1.00 Measured ratio .99
> # Testing ets bands 3 quanta 5000 3500 1500, streams 0 1 2
> # TEST: bands 0:1                                                     [ OK ]
> # INFO: Expected ratio 1.42 Measured ratio 1.42
> # TEST: bands 0:2                                                     [ OK ]
> # INFO: Expected ratio 3.33 Measured ratio 3.33
> # Testing ets bands 3 quanta 5000 8000 1500, streams 0 1 2
> # TEST: bands 0:1                                                     [ OK ]
> # INFO: Expected ratio 1.60 Measured ratio 1.59
> # TEST: bands 0:2                                                     [ OK ]
> # INFO: Expected ratio 3.33 Measured ratio 3.33
> # Testing ets bands 2 quanta 5000 2500, streams 0 1
> # TEST: bands 0:1                                                     [ OK ]
> # INFO: Expected ratio 2.00 Measured ratio 1.99
> # Running in classifier mode
> # Testing ets bands 3 strict 3, streams 0 1
> # TEST: band 0                                                        [ OK ]
> # INFO: Expected ratio >95% Measured ratio 100.00
> # TEST: band 1                                                        [ OK ]
> # INFO: Expected ratio <5% Measured ratio 0
> # Testing ets bands 3 strict 3, streams 1 2
> # TEST: band 1                                                        [ OK ]
> # INFO: Expected ratio >95% Measured ratio 100.00
> # TEST: band 2                                                        [ OK ]
> # INFO: Expected ratio <5% Measured ratio 0
> # Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 0 1
> 
> What is happening is a stuck mousezahn loop:
> 
> root      558266    2732  0 01:17 pts/2    00:00:23 mausezahn veth0.10 -p 8000 -A 192.0.2.1 -B 192.0.2.2 -c 0 -a own -b ba:33:37:81:dc:5
> root      558273    2732  0 01:17 pts/2    00:00:23 mausezahn veth0.11 -p 8000 -A 192.0.2.17 -B 192.0.2.18 -c 0 -a own -b ba:33:37:81:dc
> 
> $ sudo strace -p 558266
> ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
> ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
> ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
> ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
> ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
> ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
> ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
> ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
> ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
> ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
> ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
> ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
> ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
> ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
> ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
> ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
> ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
> ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
> ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
> ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
> ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
> ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
> 
> I am not sure if this will ever complete, or is it a runaway loop?
> 
> Probably it is stuck, for nothing was written to the test log for 71 minutes.
> 
> I hope we are coming closer to the cause.
> 
> Previously I didn't figure out what hanged.
> 
> FWIW, I will post the patches for the scripts that timed out for your review.

Hi,

More homework:

I can trace the infinite mausezahn mode to here:

lib.sh
1420 __start_traffic()
1421 {
1422         local pktsize=$1; shift
1423         local proto=$1; shift
1424         local h_in=$1; shift    # Where the traffic egresses the host
1425         local sip=$1; shift
1426         local dip=$1; shift
1427         local dmac=$1; shift
1428
1429         $MZ $h_in -p $pktsize -A $sip -B $dip -c 0 \
1430                 -a own -b $dmac -t "$proto" -q "$@" &
1431         sleep 1
1432 }

but I don't know why in this test it runs away for "veth0.10" in an infinite loop
and is obviously nowhere killed for hours.

I seem to have no new ideas now so I guess I should take a break.

Kind regards,
Mirsad

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ