lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 21 Dec 2023 15:35:26 +0200
From: Vladimir Oltean <vladimir.oltean@....com>
To: "Abdul Rahim, Faizal" <faizal.abdul.rahim@...ux.intel.com>
Cc: Vinicius Costa Gomes <vinicius.gomes@...el.com>,
	Jamal Hadi Salim <jhs@...atatu.com>,
	Cong Wang <xiyou.wangcong@...il.com>, Jiri Pirko <jiri@...nulli.us>,
	"David S . Miller" <davem@...emloft.net>,
	Eric Dumazet <edumazet@...gle.com>,
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
	netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 net 0/4] qbv cycle time extension/truncation

(sorry, I started writing this email yesterday, I noticed the
conversation continued with Paolo)

On Wed, Dec 20, 2023 at 11:25:09AM +0800, Abdul Rahim, Faizal wrote:
> Hi Vladimir,
> 
> No worries, I truly appreciate the time you took to review and reply.
> 
> What prompted this in general is related to my project requirement to enable
> software QBV cycle time extension, so there's a validation team that created
> test cases to properly validate cycle time extension. Then I noticed the
> code doesn't handle truncation properly also, since it's the same code area,
> I just fixed it together.

We tend to do patch triage between 'net' and 'net-next' based on the
balance between the urgency/impact of the fix and its complexity.

While it's undoubtable that there are issues with taprio's handling of
dynamic schedules, you've mentioned yourself that you only hit those
issues as part of some new development work - they weren't noticed by
end users. And fixing them is not quite trivial, there are also FIXMEs
in taprio which suggest so. I'm worried that the fixes may also impact
the code from stable trees in unforeseen ways.

So I would recommend moving the development of these fixes to 'net-next',
if possible.

> Each time before sending the patch for upstream review, I normally will run
> our test cases that only validates cycle time extension. For truncation, I
> modify the test cases on my own and put logs to check if the
> cycle_time_correction negative value is within the correct range. I probably
> should have mentioned sooner that I have tested this myself, sorry about
> that.
> 
> Example of the test I run for cycle time extension:
> 1) 2 boards connected back-to-back with i226 NIC. Board A as sender, Board B
> as receiver
> 2) Time is sync between 2 boards with phc2sys and ptp4l
> 3) Run GCL1 on Board A with cycle time extension enabled:
>     tc qdisc replace dev $INTERFACE parent root handle 100 taprio \
>     num_tc 4 \
>     map 3 2 1 0 3 3 3 3 3 3 3 3 3 3 3 3 \
>     queues 1@0 1@1 1@2 1@3 \
>     base-time 0 \
>     cycle-time-extension 1000000 \
>     sched-entry S 09 500000 \
>     sched-entry S 0a 500000 \
>     clockid CLOCK_TAI

Why do you need PTP sync? Cannot this test run between 2 veth ports?

> 4) capture tcp dump on Board B
> 5) Send packets from Board A to Board B with 200us interval via UDP Tai

What is udp_tai? This program?
https://gist.github.com/jeez/bd3afeff081ba64a695008dd8215866f

> 6) When packets reached Board B, trigger GCL2 to Board A:
>    CYCLETIME=1000000
>    APPLYTIME=1000000000 # 1s
>    CURRENT=$(date +%s%N)
>    BASE=$(( (CURRENT + APPLYTIME + (2*CYCLETIME)) - ((CURRENT + APPLYTIME)
>          % CYCLETIME) + ((CYCLETIME*3)/5) ))
>     tc qdisc replace dev $INTERFACE parent root handle 100 taprio \
>     num_tc 4 \
>     map 3 2 1 0 3 3 3 3 3 3 3 3 3 3 3 3 \
>     queues 1@0 1@1 1@2 1@3 \
>     base-time $BASE \
>     cycle-time-extension 1000000 \
>     sched-entry S oc 500000 \
>     sched-entry S 08 500000 \
>     clockid CLOCK_TAI
> 7) Analyze tcp dump data on Board B using wireshark, will observe packets
> receive pattern changed.
> 
> Note that I've hidden "Best Effort (default) 7001 → 7001" data from the
> wireshark log so that it's easier to see the pattern.
> 
>      TIMESTAMP               PRIORITY             PRIORITY    NOTES
> 
> 1702896645.925014509	Critical Applications	7004 → 7004   GCL1
> 1702896645.925014893	Critical Applications	7004 → 7004   GCL1
> 1702896645.925514454	Excellent Effort	7003 → 7003   GCL1
> 1702896645.925514835	Excellent Effort	7003 → 7003   GCL1
> 1702896645.926014371	Critical Applications	7004 → 7004   GCL1
> 1702896645.926014755	Critical Applications	7004 → 7004   GCL1
> 1702896645.926514620	Excellent Effort	7003 → 7003   GCL1
> 1702896645.926515004	Excellent Effort	7003 → 7003   GCL1
> 1702896645.927014408	Critical Applications	7004 → 7004   GCL1
> 1702896645.927014792	Critical Applications	7004 → 7004   GCL1
> 1702896645.927514789	Excellent Effort	7003 → 7003   GCL1
> 1702896645.927515173	Excellent Effort	7003 → 7003   GCL1
> 1702896645.928168304	Excellent Effort	7003 → 7003   Extended
> 1702896645.928368780	Excellent Effort	7003 → 7003   Extended
> 1702896645.928569406	Excellent Effort	7003 → 7003   Extended
> 1702896645.929614835	Background	        7002 → 7002   GCL2
> 1702896645.929615219	Background	        7002 → 7002   GCL2
> 1702896645.930614643	Background	        7002 → 7002   GCL2
> 1702896645.930615027	Background	        7002 → 7002   GCL2
> 1702896645.931614604	Background	        7002 → 7002   GCL2
> 1702896645.931614991	Background	        7002 → 7002   GCL2
> 
> The extended packets only will happen if cycle_time and interval fields
> are updated using cycle_time_correction. Without that patch, the extended
> packets are not received.
> 
> 
> As for the negative truncation case, I just make the interval quite long,
> and experimented with GCL2 base-time value so that it hits the "next entry"
> in advance_sched(). Then I checked my logs in get_cycle_time_correction() to
> see the truncation case and its values.
> 
> Based on your feedback of the test required, I think that my existing
> truncation test is not enough, but the extension test case part should be
> good right ?
> 
> Do let me know then, I'm more than willing to do more test for the
> truncation case as per your suggestion, well basically, anything to help
> speed up the patches series review process :)
> 
> 
> Appreciate your suggestion and help a lot, thank you.

Do you think you could automate a test suite which only measures software
TX timestamps and works on veth?

I prepared this very small patch set just to give you a head start
(the skeleton). You'll still have to add the logic for individual tests.
https://lore.kernel.org/netdev/20231221132521.2314811-1-vladimir.oltean@nxp.com/
I'm terribly sorry, but this is the most I can do due to my current lack
of spare time, unfortunately.

If you've never run kselftests before, you'll need some kernel options
to enable VRF support. From my notes I have this list below, but there
may be more missing options.

CONFIG_IP_MULTIPLE_TABLES=y
CONFIG_NET_L3_MASTER_DEV=y
CONFIG_NET_VRF=y

Let me know if you face any trouble or if I can help in some way.
Thanks for doing this.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ