linux-kernel - [PATCH net 0/4] Fix Felix DSA taprio gates after clock jump

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250426144859.3128352-1-vladimir.oltean@nxp.com>
Date: Sat, 26 Apr 2025 17:48:54 +0300
From: Vladimir Oltean <vladimir.oltean@....com>
To: netdev@...r.kernel.org
Cc: Claudiu Manoil <claudiu.manoil@....com>,
	Alexandre Belloni <alexandre.belloni@...tlin.com>,
	UNGLinuxDriver@...rochip.com,
	Andrew Lunn <andrew+netdev@...n.ch>,
	"David S. Miller" <davem@...emloft.net>,
	Eric Dumazet <edumazet@...gle.com>,
	Jakub Kicinski <kuba@...nel.org>,
	Paolo Abeni <pabeni@...hat.com>,
	Simon Horman <horms@...nel.org>,
	Shuah Khan <shuah@...nel.org>,
	Richie Pearn <richard.pearn@....com>,
	Xiaoliang Yang <xiaoliang.yang_1@....com>,
	linux-kernel@...r.kernel.org,
	linux-kselftest@...r.kernel.org
Subject: [PATCH net 0/4] Fix Felix DSA taprio gates after clock jump

Richie Pearn presented a reproducible situation where traffic would get
blocked on the NXP LS1028A switch if a certain taprio schedule was
applied, and stepping the PTP clock would take place. The latter event
is an expected initial occurrence, but also at runtime, for example when
transitioning from one grandmaster to another.

The issue is completely described in patch 1/4, which also contains
the fix, but it has left me with some doubts regarding the need for
vsc9959_tas_clock_adjust() in general.

In order to prove to myself that vsc9959_tas_clock_adjust() is needed in
general, I have written a selftest for the tc-taprio data path in patch
4/4. On the LS1028A, we can clearly see the following failures without
that function:

INFO: Forcing a backward clock jump
TEST: ping                                                          [FAIL]
INFO: Setting up taprio after PTP
TEST: In band with gate                                             [FAIL]
        Reception of 100 packets failed
TEST: Out of band with gate                                         [FAIL]
        Reception of 100 packets failed

As for testing my fix from patch 1/4, that was quite a bit more complex
to do automatically. In fact, I couldn't find any other schedule that
would fail to be updated by vsc9959_tas_clock_adjust() as cleanly as
the schedule from Richie, so I've added that specific schedule as the
test_clock_jump_backward() test.

The test ordering is also (unfortunately) very strategic. Running the
selftest to the end dirties the GCL RAM, and when running
test_clock_jump_backward() once again, the GCL entries won't be all
zeroes as they were the first time around. They will contain bits and
pieces of old schedules, making it very challenging to make it fail.

Thus, test_clock_jump_backward() is the first in the test suite, and
without patch 1/4, it is only supposed to fail the _first_ time when
running after a clean boot.

Vladimir Oltean (4):
  net: dsa: felix: fix broken taprio gate states after clock jump
  selftests: net: tsn_lib: create common helper for counting received
    packets
  selftests: net: tsn_lib: add window_size argument to isochron_do()
  selftests: net: tc_taprio: new test

 drivers/net/dsa/ocelot/felix_vsc9959.c        |   5 +-
 .../selftests/drivers/net/dsa/tc_taprio.sh    |   1 +
 .../selftests/drivers/net/ocelot/psfp.sh      |   8 +-
 .../selftests/net/forwarding/tc_taprio.sh     | 421 ++++++++++++++++++
 .../selftests/net/forwarding/tsn_lib.sh       |  26 ++
 5 files changed, 454 insertions(+), 7 deletions(-)
 create mode 120000 tools/testing/selftests/drivers/net/dsa/tc_taprio.sh
 create mode 100755 tools/testing/selftests/net/forwarding/tc_taprio.sh

-- 
2.43.0