lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <D682I0FBTZYR.RBPXND1NUZFR@kernel.org>
Date: Tue, 10 Dec 2024 14:50:33 +0100
From: "Michael Walle" <mwalle@...nel.org>
To: "Vladimir Oltean" <vladimir.oltean@....com>, <netdev@...r.kernel.org>
Cc: "David S. Miller" <davem@...emloft.net>, "Eric Dumazet"
 <edumazet@...gle.com>, "Jakub Kicinski" <kuba@...nel.org>, "Paolo Abeni"
 <pabeni@...hat.com>, "Andrew Lunn" <andrew@...n.ch>, "Claudiu Manoil"
 <claudiu.manoil@....com>, "Alexandre Belloni"
 <alexandre.belloni@...tlin.com>, <UNGLinuxDriver@...rochip.com>, "Xiaoliang
 Yang" <xiaoliang.yang_1@....com>, "Yangbo Lu" <yangbo.lu@....com>, "Radu
 Bulie" <radu-andrei.bulie@....com>
Subject: Re: [PATCH net] net: dsa: felix: fix stuck CPU-injected packets
 with short taprio windows

On Tue Dec 10, 2024 at 2:26 PM CET, Vladimir Oltean wrote:
> With this port schedule:
>
> tc qdisc replace dev $send_if parent root handle 100 taprio \
> 	num_tc 8 queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
> 	map 0 1 2 3 4 5 6 7 \
> 	base-time 0 cycle-time 10000 \
> 	sched-entry S 01 1250 \
> 	sched-entry S 02 1250 \
> 	sched-entry S 04 1250 \
> 	sched-entry S 08 1250 \
> 	sched-entry S 10 1250 \
> 	sched-entry S 20 1250 \
> 	sched-entry S 40 1250 \
> 	sched-entry S 80 1250 \
> 	flags 2
>
> ptp4l would fail to take TX timestamps of Pdelay_Resp messages like this:
>
> increasing tx_timestamp_timeout may correct this issue, but it is likely caused by a driver bug
> ptp4l[4134.168]: port 2: send peer delay response failed
>
> It turns out that the driver can't take their TX timestamps because it
> can't transmit them in the first place. And there's nothing special
> about the Pdelay_Resp packets - they're just regular 68 byte packets.
> But with this taprio configuration, the switch would refuse to send even
> the ETH_ZLEN minimum packet size.
>
> This should have definitely not been the case. When applying the taprio
> config, the driver prints:
>
> mscc_felix 0000:00:00.5: port 0 tc 0 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
> mscc_felix 0000:00:00.5: port 0 tc 1 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
> mscc_felix 0000:00:00.5: port 0 tc 2 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
> mscc_felix 0000:00:00.5: port 0 tc 3 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
> mscc_felix 0000:00:00.5: port 0 tc 4 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
> mscc_felix 0000:00:00.5: port 0 tc 5 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
> mscc_felix 0000:00:00.5: port 0 tc 6 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
> mscc_felix 0000:00:00.5: port 0 tc 7 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
>
> and thus, everything under 132 bytes - ETH_FCS_LEN should have been sent
> without problems. Yet it's not.
>
> For the forwarding path, the configuration is fine, yet packets injected
> from Linux get stuck with this schedule no matter what.
>
> The first hint that the static guard bands are the cause of the problem
> is that reverting Michael Walle's commit 297c4de6f780 ("net: dsa: felix:
> re-enable TAS guard band mode") made things work. It must be that the
> guard bands are calculated incorrectly.
>
> I remembered that there is a magic constant in the driver, set to 33 ns
> for no logical reason other than experimentation, which says "never let
> the static guard bands get so large as to leave less than this amount of
> remaining space in the time slot, because the queue system will refuse
> to schedule packets otherwise, and they will get stuck". I had a hunch
> that my previous experimentally-determined value was only good for
> packets coming from the forwarding path, and that the CPU injection path
> needed more.
>
> I came to the new value of 35 ns through binary search, after seeing
> that with 544 ns (the bit time required to send the Pdelay_Resp packet
> at gigabit) it works. Again, this is purely experimental, there's no
> logic and the manual doesn't say anything.
>
> The new driver prints for this schedule look like this:
>
> mscc_felix 0000:00:00.5: port 0 tc 0 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
> mscc_felix 0000:00:00.5: port 0 tc 1 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
> mscc_felix 0000:00:00.5: port 0 tc 2 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
> mscc_felix 0000:00:00.5: port 0 tc 3 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
> mscc_felix 0000:00:00.5: port 0 tc 4 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
> mscc_felix 0000:00:00.5: port 0 tc 5 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
> mscc_felix 0000:00:00.5: port 0 tc 6 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
> mscc_felix 0000:00:00.5: port 0 tc 7 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
>
> So yes, the maximum MTU is now even smaller by 1 byte than before.
> This is maybe counter-intuitive, but makes more sense with a diagram of
> one time slot.
>
> Before:
>
>  Gate open                                   Gate close
>  |                                                    |
>  v           1250 ns total time slot duration         v
>  <---------------------------------------------------->
>  <----><---------------------------------------------->
>   33 ns            1217 ns static guard band
>   useful
>
>  Gate open                                   Gate close
>  |                                                    |
>  v           1250 ns total time slot duration         v
>  <---------------------------------------------------->
>  <-----><--------------------------------------------->
>   35 ns            1215 ns static guard band
>   useful
>
> The static guard band implemented by this switch hardware directly
> determines the maximum allowable MTU for that traffic class. The larger
> it is, the earlier the switch will stop scheduling frames for
> transmission, because otherwise they might overrun the gate close time
> (and avoiding that is the entire purpose of Michael's patch).
> So, we now have guard bands smaller by 2 ns, thus, in this particular
> case, we lose a byte of the maximum MTU.
>
> Fixes: 11afdc6526de ("net: dsa: felix: tc-taprio intervals smaller than MTU should send at least one packet")
> Signed-off-by: Vladimir Oltean <vladimir.oltean@....com>

Makes sense:

Reviewed-by: Michael Walle <mwalle@...nel.org>

-michael

Download attachment "signature.asc" of type "application/pgp-signature" (298 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ