lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sat,  3 Sep 2022 00:57:00 +0300
From:   Vladimir Oltean <vladimir.oltean@....com>
To:     netdev@...r.kernel.org
Cc:     "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        Xiaoliang Yang <xiaoliang.yang_1@....com>,
        Claudiu Manoil <claudiu.manoil@....com>,
        Alexandre Belloni <alexandre.belloni@...tlin.com>,
        UNGLinuxDriver@...rochip.com, Andrew Lunn <andrew@...n.ch>,
        Vivien Didelot <vivien.didelot@...il.com>,
        Florian Fainelli <f.fainelli@...il.com>,
        Michael Walle <michael@...le.cc>,
        Vinicius Costa Gomes <vinicius.gomes@...el.com>,
        Maxim Kochetkov <fido_max@...ox.ru>,
        Colin Foster <colin.foster@...advantage.com>,
        Richie Pearn <richard.pearn@....com>,
        linux-kernel@...r.kernel.org
Subject: [PATCH net 1/3] net: dsa: felix: allow small tc-taprio windows to send at least some packets

The blamed commit broke tc-taprio schedules such as this one:

tc qdisc replace dev $swp1 root taprio \
	num_tc 8 \
	map 0 1 2 3 4 5 6 7 \
	queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
	base-time 0 \
	sched-entry S 0x7f 990000 \
	sched-entry S 0x80  10000 \
	flags 0x2

because the gate entry for TC 7 (S 0x80 10000 ns) now has a static guard
band added earlier than its 'gate close' event, such that packet
overruns won't occur in the worst case of the largest packet possible.

Since guard bands are statically determined based on the per-tc
QSYS_QMAXSDU_CFG_* with a fallback on the port-based QSYS_PORT_MAX_SDU,
we need to discuss depending on kernel version, since the driver, prior
to commit 55a515b1f5a9 ("net: dsa: felix: drop oversized frames with
tc-taprio instead of hanging the port"), did not touch
QSYS_QMAXSDU_CFG_*, and therefore relied on QSYS_PORT_MAX_SDU.

1 (before vsc9959_tas_guard_bands_update): QSYS_PORT_MAX_SDU defaults to
  1518, and at gigabit this introduces a static guard band (independent
  of packet sizes) of 12144 ns. But this is larger than the time window
  itself, of 10000 ns. So, the queue system never considers a frame with
  TC 7 as eligible for transmission, since the gate practically never
  opens, and these frames are forever stuck in the TX queues and hang
  the port.

2 (after vsc9959_tas_guard_bands_update): We make an effort to set
  QSYS_QMAXSDU_CFG_7 to 1230 bytes, and this enables oversized frame
  dropping for everything larger than that. But QSYS_QMAXSDU_CFG_7 plays
  2 roles. One is oversized frame dropping, the other is the per-tc
  static guard band. When we calculated QSYS_QMAXSDU_CFG_7 to be 1230,
  we considered no guard band at all, and the entire time window
  available for transmission, which is not the case. The larger
  QSYS_QMAXSDU_CFG_7 is, the larger the static guard band for the tc is,
  too.

In both cases, frames with any size (even 60 bytes sans FCS) are stuck
on egress rather than being considered for scheduling on TC 7, even if
they fit. This is because the static guard band is way too large.
Considering the current situation, with vsc9959_tas_guard_bands_update(),
frames between 60 octets and 1230 octets in size are not eligible for
oversized dropping (because they are smaller than QSYS_QMAXSDU_CFG_7),
but won't be considered as eligible for scheduling either, because the
min_gate_len[7] (10000 ns) - the guard band determined by
QSYS_QMAXSDU_CFG_7 (1230 octets * 8 ns per octet == 9840 ns) is smaller
than their transmit time.

A solution that is quite outrageous is to limit the minimum valid gate
interval acceptable through tc-taprio, such that intervals, when
transformed into L1 frame bit times, are never smaller than twice the
MTU of the interface. However, the tc-taprio UAPI operates in ns, and
the link speed can change at runtime (to 10 Mbps, where the transmission
time of 1 octet is 800 ns). And since the max MTU is around 9000, we'd
have to limit the tc-taprio intervals to be no smaller than 14.4 ms on
the premise that it is possible for the link to renegotiate to 10 Mbps,
which is astonishingly limiting for real use cases, where the entire
*cycle* (here we're talking about a single interval) must be 100 us or
lower.

The solution is to modify vsc9959_tas_guard_bands_update() to take into
account that the static per-tc guard bands consume time out of our time
window too, not just packet transmission. The unknown which needs to be
determined is the max admissible frame size. Both the useful bit time
and the guard band size will depend on this unknown variable, so
dividing the available 10000 ns into 2 halves sounds like the ideal
strategy. In this case, we will program QSYS_QMAXSDU_CFG_7 with a
maximum frame length (and guard band size) of 605 octets (this includes
FCS but not IPG and preamble/SFD). With this value, everything of L2
size 601 (sans FCS) and higher is considered as oversized, and the guard
band is low enough (605 + HSCH_MISC.FRM_ADJ, at 1Gbps => 5000 ns) in
order to not disturb the scheduling of any frame smaller than L2 size 601.

Fixes: 297c4de6f780 ("net: dsa: felix: re-enable TAS guard band mode")
Signed-off-by: Vladimir Oltean <vladimir.oltean@....com>
---
 drivers/net/dsa/ocelot/felix_vsc9959.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/net/dsa/ocelot/felix_vsc9959.c b/drivers/net/dsa/ocelot/felix_vsc9959.c
index 1cdce8a98d1d..6fa4e0161b34 100644
--- a/drivers/net/dsa/ocelot/felix_vsc9959.c
+++ b/drivers/net/dsa/ocelot/felix_vsc9959.c
@@ -1599,9 +1599,10 @@ static void vsc9959_tas_guard_bands_update(struct ocelot *ocelot, int port)
 		u32 max_sdu;
 
 		if (min_gate_len[tc] == U64_MAX /* Gate always open */ ||
-		    min_gate_len[tc] * PSEC_PER_NSEC > needed_bit_time_ps) {
+		    min_gate_len[tc] * PSEC_PER_NSEC > 2 * needed_bit_time_ps) {
 			/* Setting QMAXSDU_CFG to 0 disables oversized frame
-			 * dropping.
+			 * dropping and leaves just the port-based static
+			 * guard band.
 			 */
 			max_sdu = 0;
 			dev_dbg(ocelot->dev,
@@ -1612,9 +1613,17 @@ static void vsc9959_tas_guard_bands_update(struct ocelot *ocelot, int port)
 			/* If traffic class doesn't support a full MTU sized
 			 * frame, make sure to enable oversize frame dropping
 			 * for frames larger than the smallest that would fit.
+			 *
+			 * However, the exact same register, * QSYS_QMAXSDU_CFG_*,
+			 * controls not only oversized frame dropping, but also
+			 * per-tc static guard band lengths. Therefore, the max
+			 * SDU supported by this tc is determined by splitting
+			 * its time window into 2: one for the useful traffic
+			 * and one for the guard band. Both halves have the
+			 * length equal to one max sized packet.
 			 */
 			max_sdu = div_u64(min_gate_len[tc] * PSEC_PER_NSEC,
-					  picos_per_byte);
+					  2 * picos_per_byte);
 			/* A TC gate may be completely closed, which is a
 			 * special case where all packets are oversized.
 			 * Any limit smaller than 64 octets accomplishes this
-- 
2.34.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ