lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <35329166-56a7-a57e-666e-6a5e6616ac4d@tessares.net>
Date: Thu, 13 Jul 2023 15:59:27 +0200
From: Matthieu Baerts <matthieu.baerts@...sares.net>
To: Pedro Tammela <pctammela@...atatu.com>,
 Jamal Hadi Salim <jhs@...atatu.com>, Cong Wang <xiyou.wangcong@...il.com>,
 Jiri Pirko <jiri@...nulli.us>
Cc: netdev <netdev@...r.kernel.org>, Anders Roxell
 <anders.roxell@...aro.org>, Davide Caratti <dcaratti@...hat.com>
Subject: Re: TC: selftests: current timeout (45s) is too low

Hi Pedro,

On 12/07/2023 19:12, Pedro Tammela wrote:
> On 12/07/2023 11:43, Matthieu Baerts wrote:
>> Hi Pedro,
>>
>> On 12/07/2023 15:43, Pedro Tammela wrote:
>>> I have been involved in tdc for a while now, here are my comments.
>>
>> Thank you for your reply!
>>
>>> On 12/07/2023 06:47, Matthieu Baerts wrote:
>>>> Hi Jamal, Cong, Jiri,
>>>>
>>>> When looking for something else [1] in LKFT reports [2], I noticed that
>>>> the TC selftest ended with a timeout error:
>>>>
>>>>     not ok 1 selftests: tc-testing: tdc.sh # TIMEOUT 45 seconds
>>>>
>>>> The timeout has been introduced 3 years ago:
>>>>
>>>>     852c8cbf34d3 ("selftests/kselftest/runner.sh: Add 45 second timeout
>>>> per test")
>>>>
>>>> Recently, a new option has been introduced to override the value when
>>>> executing the code:
>>>>
>>>>     f6a01213e3f8 ("selftests: allow runners to override the timeout")
>>>>
>>>> But I guess it is still better to set a higher default value for TC
>>>> tests. This is easy to fix by simply adding "timeout=<seconds>" in a
>>>> "settings" file in 'tc-testing' directory, e.g.
>>>>
>>>>     echo timeout=1200 > tools/testing/selftests/tc-testing/settings
>>>>
>>>> I'm sending this email instead of a patch because I don't know which
>>>> value makes sense. I guess you know how long the tests can take in a
>>>> (very) slow environment and you might want to avoid this timeout error.
>>>
>>> I believe a timeout between 5-10 to minutes should cover the entire
>>> suite
>>
>> Thank you for your feedback.
>> If we want to be on the safe side, I guess it is better to put 10
>> minutes or even 15, no?
> 
> Sure, makes sense.
> If someone complains we can lower it.
> 
>>
>>>> I also noticed most of the tests were skipped [2], probably because
>>>> something is missing in the test environment? Do not hesitate to
>>>> contact
>>>> the lkft team [3], that's certainly easy to fix and it would increase
>>>> the TC test coverage when they are validating all the different kernel
>>>> versions :)
>>>
>>>  From the logs it seems like the kernel image is missing the 'ct'
>>> action.
>>> Possibly also missing other actions/tc components, so it seems like a
>>> kernel config issue.
>>
>> According to [1], the kconfig is generated by merging these files:
>>
>>    defconfig, systemd.config [2], tools/testing/selftests/kexec/config,
>> tools/testing/selftests/net/config,
>> tools/testing/selftests/net/mptcp/config,
>> tools/testing/selftests/net/hsr/config,
>> tools/testing/selftests/net/forwarding/config,
>> tools/testing/selftests/tc-testing/config
>>
>> You can see the final .config file in [3].
>>
>> I can see "CONFIG_NET_ACT_CTINFO(=m)" but not "CONFIG_NET_ACT_CT" while
>> they are both in tc-testing/config file. Maybe a conflict with another
>> selftest config?
>>
>> I don't see any mention of "NET_ACT_CT" in the build logs [4].
> 
> There's a requirement for NET_ACT_CT which is not set in the final
> config (CONFIG_NF_FLOW_TABLE).
> 
> Perhaps this could fix?
> diff --git a/tools/testing/selftests/tc-testing/config
> b/tools/testing/selftests/tc-testing/config
> index 6e73b09c20c8..d1ad29040c02 100644
> --- a/tools/testing/selftests/tc-testing/config
> +++ b/tools/testing/selftests/tc-testing/config
> @@ -5,6 +5,7 @@ CONFIG_NF_CONNTRACK=m
>  CONFIG_NF_CONNTRACK_MARK=y
>  CONFIG_NF_CONNTRACK_ZONES=y
>  CONFIG_NF_CONNTRACK_LABELS=y
> +CONFIG_NF_FLOW_TABLE=m
>  CONFIG_NF_NAT=m
>  CONFIG_NETFILTER_XT_TARGET_LOG=m

Yes it does!

I got access to the tuxsuite to reproduce the issues with the suggested
fixes. The i386 build job is visible in [1] (kconfig in [2]) and the
test job in [3] (logs in [4]).

[1]
https://tuxapi.tuxsuite.com/v1/groups/community/projects/matthieu.baerts/builds/2SW6Vk3VYTGyW90OBecA3knJFIz
[2]
https://storage.tuxsuite.com/public/community/matthieu.baerts/builds/2SW6Vk3VYTGyW90OBecA3knJFIz/config
[3]
https://tuxapi.tuxsuite.com/v1/groups/community/projects/matthieu.baerts/tests/2SWB6sYne9afpOxqp3CNE5BxAn8
[4]
https://tuxapi.tuxsuite.com/v1/groups/community/projects/matthieu.baerts/tests/2SWB6sYne9afpOxqp3CNE5BxAn8/logs?format=html


Note that the TC tests have been executed in less than 3 minutes. 15
minutes seem more than enough then! (I don't know how "fast" is this
environment).

We can see that all tests have been executed except one:

> # ok 495 6bda - Add tunnel_key action with nofrag option # skipped - probe command: test skipped.

Maybe something else missing?

Other than that, 6 tests have failed:

- Add skbedit action with valid mark and mask with invalid format

> # not ok 284 bc15 - Add skbedit action with valid mark and mask with invalid format
> # 	Command exited with 0, expected 255

- Add ct action triggering DNAT tuple conflict:

> # not ok 373 3992 - Add ct action triggering DNAT tuple conflict
> # 	Could not match regex pattern. Verify command output:
> # cat: /proc/net/nf_conntrack: No such file or directory

- Add xt action with log-prefix

> # not ok 408 2029 - Add xt action with log-prefix
> # 	Could not match regex pattern. Verify command output:
> # total acts 1
> # 
> # 	action order 0: tablename: mangle  hook: NF_IP_POST_ROUTING
> # 	target  LOG level warn prefix \"PONG\"
> # 	index 100 ref 1 bind 0
> # 	not_in_hw

- Replace xt action log-prefix

> # not ok 409 3562 - Replace xt action log-prefix
> # 	Could not match regex pattern. Verify command output:
> # total acts 0
> # 
> # 	action order 1: tablename: mangle  hook: NF_IP_POST_ROUTING
> # 	target  LOG level warn prefix \"WIN\"
> # 	index 1 ref 1 bind 0
> # 	not_in_hw

- Delete xt action with invalid index

> # not ok 411 5169 - Delete xt action with invalid index
> # 	Could not match regex pattern. Verify command output:
> # total acts 0
> # 
> # 	action order 1: tablename: mangle  hook: NF_IP_POST_ROUTING
> # 	target  LOG level warn prefix \"PONG\"
> # 	index 1000 ref 1 bind 0
> # 	not_in_hw

- Add xt action with duplicate index

> # not ok 414 8437 - Add xt action with duplicate index
> # 	Could not match regex pattern. Verify command output:
> # total acts 0
> # 
> # 	action order 1: tablename: mangle  hook: NF_IP_POST_ROUTING
> # 	target  LOG level warn prefix \"PONG\"
> # 	index 101 ref 1 bind 0
> # 	not_in_hw

I can see that at least "CONFIG_NF_CONNTRACK_PROCFS" kconfig is needed
as well for the 373rd test (adding it seems helping: [5]).

Not sure about the 5 others, I don't know what these tests are doing, I
came here by accident and I don't think I'm the most appropriated person
to fix that: do you know if someone can look at the 5 other errors? :)

I can send patches to fix the timeout + the two missing kconfig if you want.

Cheers,
Matt

[5]
https://tuxapi.tuxsuite.com/v1/groups/community/projects/matthieu.baerts/tests/2SWHb7PJfqkUX1m8rLu3GXbsHE0/logs?format=html
-- 
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ