linux-kernel - Re: [PATCH net 1/3] net: dsa: felix: allow small tc-taprio windows to send at least some packets

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20220905090014.w557vqbhb5oefupi@skbuf>
Date:   Mon, 5 Sep 2022 09:00:16 +0000
From:   Vladimir Oltean <vladimir.oltean@....com>
To:     Michael Walle <michael@...le.cc>
CC:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        Xiaoliang Yang <xiaoliang.yang_1@....com>,
        Claudiu Manoil <claudiu.manoil@....com>,
        Alexandre Belloni <alexandre.belloni@...tlin.com>,
        "UNGLinuxDriver@...rochip.com" <UNGLinuxDriver@...rochip.com>,
        Andrew Lunn <andrew@...n.ch>,
        Vivien Didelot <vivien.didelot@...il.com>,
        Florian Fainelli <f.fainelli@...il.com>,
        Vinicius Costa Gomes <vinicius.gomes@...el.com>,
        Maxim Kochetkov <fido_max@...ox.ru>,
        Colin Foster <colin.foster@...advantage.com>,
        Richie Pearn <richard.pearn@....com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH net 1/3] net: dsa: felix: allow small tc-taprio windows to
 send at least some packets

On Mon, Sep 05, 2022 at 09:29:44AM +0200, Michael Walle wrote:
> > 1 (before vsc9959_tas_guard_bands_update): QSYS_PORT_MAX_SDU defaults to
> >   1518, and at gigabit this introduces a static guard band (independent
> >   of packet sizes) of 12144 ns. But this is larger than the time window
> >   itself, of 10000 ns. So, the queue system never considers a frame with
> >   TC 7 as eligible for transmission, since the gate practically never
> >   opens, and these frames are forever stuck in the TX queues and hang
> >   the port.
> 
> IIRC we deliberately ignored that problem back then, because we couldn't
> set the maxsdu.

I don't remember exactly why that is. It seems stupid to ignore a
condition that leads to the port hanging. I think part of the problem
was that I didn't have a test setup at the time the guard band patches
were proposed.

> > The solution is to modify vsc9959_tas_guard_bands_update() to take into
> > account that the static per-tc guard bands consume time out of our time
> > window too, not just packet transmission. The unknown which needs to be
> > determined is the max admissible frame size. Both the useful bit time
> > and the guard band size will depend on this unknown variable, so
> > dividing the available 10000 ns into 2 halves sounds like the ideal
> > strategy. In this case, we will program QSYS_QMAXSDU_CFG_7 with a
> > maximum frame length (and guard band size) of 605 octets (this includes
> > FCS but not IPG and preamble/SFD). With this value, everything of L2
> > size 601 (sans FCS) and higher is considered as oversized, and the guard
> > band is low enough (605 + HSCH_MISC.FRM_ADJ, at 1Gbps => 5000 ns) in
> > order to not disturb the scheduling of any frame smaller than L2 size
> > 601.
> 
> So one drawback with this is that you limit the maxsdu to match a
> frame half of the gate open time, right?

Yes.

> The switch just schedule the *start* event of the frame. So even if
> the guard band takes 99% of the gate open time, it should be able
> to send a frame regardless of it's length during the first 1% of
> the period (and it doesn't limit the maxsdu by half). IIRC the guard
> band is exactly for that, that is that you don't know the frame
> length and you can still schedule the frame. I know of switches
> which don't use a guard band but know the exact length and the
> closing time of the queue and deduce by that if the frame can
> still be queued.
> 
> Actually, I'd expect it to work after your vsc9959_tas_guard_bands_update.
> Hmm.
> 
> To quote from you above:
> > min_gate_len[7] (10000 ns) - the guard band determined by
> > QSYS_QMAXSDU_CFG_7 (1230 octets * 8 ns per octet == 9840 ns) is smaller
> > than their transmit time.
> 
> Are you sure this is the case? There should be 160ns time to
> schedule the start of the frame. Maybe the 160ns is just too
> small.

Yes, I'm absolutely sure that any packet gets dropped on egress with a
10 us window, and I can see from my explanation why that is not obvious.
The reason is because the guard band for tc 7 is not only determined by
QSYS_QMAXSDU_CFG_7, but also by adding the L1 overhead configured
through HSCH_MISC.FRM_ADJ (default 20 decimal).

So from the remaining 160 ns, we also lose 20 * 8 = 160 ns to the L1
overhead, and that's why the switch doesn't schedule anything.

In fact now I finally understand the private message that Xiaoliang sent
to me, where he said that he can make things work by making HSCH_MISC.FRM_ADJ
smaller than the default of 20. I initially didn't understand why you'd
want to do that.

The problem with HSCH_MISC.FRM_ADJ is that it's global to the switch,
and it's also used for some other shaper computations, so altering it is
not such a great idea.

But you (and Xiaoliang) do raise a valid point that the switch doesn't
need a full window size of open gate to schedule a full window size
worth of packet. So cutting the available window size in half is a bit
drastic. I'll think a bit more whether there is any smarter adjustment I
can do to ensure that any window, after trimming the extended static
guard band, still has 32 ns (IIRC, that's the minimum required) of time.
That should still ensure we don't have overruns. If you have any idea,
shoot.