netdev - RE: [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <DM2PR0401MB138917636CF4C7082EE26F27927D0@DM2PR0401MB1389.namprd04.prod.outlook.com>
Date:   Mon, 2 Oct 2017 22:52:19 +0000
From:   Rodney Cummings <rodney.cummings@...com>
To:     Levi Pearson <levipearson@...il.com>
CC:     Linux Kernel Network Developers <netdev@...r.kernel.org>,
        Vinicius Costa Gomes <vinicius.gomes@...el.com>,
        Henrik Austad <henrik@...tad.us>,
        "richardcochran@...il.com" <richardcochran@...il.com>,
        "jesus.sanchez-palencia@...el.com" <jesus.sanchez-palencia@...el.com>,
        "andre.guedes@...el.com" <andre.guedes@...el.com>
Subject: RE: [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for
 traffic shapers

Thanks Levi,

Great discussion. It sounds like we're aligned.

Qbv is essential in switches. My concern was the end-station side only.

You're absolutely right that i210 LaunchTime doesn't do everything that is needed 
for scheduling. It still requires quite a bit of frame-juggling to ensure that
transmit descriptors are ordered by time, and that the control system's cyclic
traffic is repeated as expected. Like you, I'm not sure where that code is best
located in the long run.

For a hardware design that connects an FPGA to one of the internal ports of a
Qbv-capable switch, the preferable scheduling design can be done there.
The FPGA IP essentially implements a cyclic schedule of transmit descriptors,
each with its own buffer that can be written atomically by application code.
For a control system (i.e. industrial field-level or automotive control),
each buffer (stream) can hold a single frame. If you take a look at the
"Network Interface" specs for FlexRay in automotive, they went so far as to
mandate that design for a NIC. The IP for that sort of thing
doesn't take alot of gates or memory, so I can see why FlexRay went that way.

That sort of design wasn't destined to happen as part of Qbv, because the switch
vendors are worried about things like 64-port switches. Also, if we treat Qbv
as applying to switches only, FlexRay-style IP isn't needed. For the switch,
we only need it to "get out of the way", and Qbv is fine for that.

I may be naive and idealistic, but in the long run, I'm still hopeful that an
Ethernet NIC vendor will implement Flexray-style scheduling for end-station. 
We need a NIC trend-setter to do it as a differentiator, and the rest will 
probably follow along. That completely eliminates code in Linux for the 
frame-juggling. We'll need to provide a way to open a socket for a specific 
stream, but that doesn't seem too challenging.

Rodney




> -----Original Message-----
> From: Levi Pearson [mailto:levipearson@...il.com]
> Sent: Monday, October 2, 2017 4:49 PM
> To: Rodney Cummings <rodney.cummings@...com>
> Cc: Linux Kernel Network Developers <netdev@...r.kernel.org>; Vinicius
> Costa Gomes <vinicius.gomes@...el.com>; Henrik Austad <henrik@...tad.us>;
> richardcochran@...il.com; jesus.sanchez-palencia@...el.com;
> andre.guedes@...el.com
> Subject: Re: [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for
> traffic shapers
> 
> Hi Rodney,
> 
> On Mon, Oct 2, 2017 at 1:40 PM, Rodney Cummings <rodney.cummings@...com>
> wrote:
> 
> > It's a shame that someone built such hardware. Speaking as a
> manufacturer
> > of daisy-chainable products for industrial/automotive applications, I
> > wouldn't use that hardware in my products. The whole point of scheduling
> > is to obtain close-to-optimal network determinism. If the hardware
> > doesn't schedule properly, the product is probably better off using CBS.
> 
> I should note that the hardware I'm using was not designed to be a TSN
> endpoint. It just happens that the hardware, designed for other
> reasons, happens to have hardware support for Qbv and thus makes it an
> interesting and inexpensive platform on which to do Qbv experiments.
> Nevertheless, I believe the right combination of driver support for
> the hardware shaping and appropriate software in the network stack
> would allow it to schedule things fairly precisely in the
> application-centric manner you've described. I will ultimately defer
> to your judgement on that, though, and spend some time with the Qcc
> draft to see where I may be wrong.
> 
> > It would be great if all of the user-level application code was
> scheduled
> > as tightly as the network, but the reality is that we aren't there yet.
> > Also, even if the end-station has a separately timed RT Linux thread for
> > each stream, the order of those threads can vary. Therefore, when the
> > end-station (SoC) has more than one talker, the frames for each talker
> > can be written into the Qbv queue at any time, in any order.
> > There is no scheduling of frames (i.e. streams)... only the queue.
> 
> Frames are presented to *the kernel* in any order. The qdisc
> infrastructure provides mechanisms by which other scheduling policy
> can be added between presentation to the kernel and presentation to
> hardware queues.
> 
> Even with i210-style LaunchTime functionality, this must also be
> provided. The hardware cannot re-order time-scheduled frames, and
> reviewers of the SO_TXTIME proposal rejected the idea of having the
> driver try to re-order scheduled frames already submitted to the
> hardware-visible descriptor chains, and rightly so I think.
> 
> So far, I have thought of 3 places this ordering constraint might be
> provided:
> 
> 1. Built into each driver with offload support for SO_TXTIME
> 2. A qdisc module that will re-order scheduled skbuffs within a certain
> window
> 3. Left to userspace to coordinate
> 
> I think 1 is particularly bad due to duplication of code. 3 is the
> easy default from the point of view of the kernel, but not attractive
> from the point of view of precise scheduling and low latency to
> egress. Unless someone has a better idea, I think 2 would be the most
> appropriate choice. I can think of a number of ways one might be
> designed, but I have not thought them through fully yet.
> 
> > From the perspective of the CNC, that means that a Qbv-only end-station
> is
> > sloppy. In contrast, an end-station using per-stream scheduling
> > (e.g. i210 with LaunchTime) is precise, because each frame has a known
> time.
> > It's true that P802.1Qcc supports both, so the CNC might support both.
> > Nevertheless, the CNC is likely to compute a network-wide schedule
> > that optimizes for the per-stream end-stations, and locates the windows
> > for the Qbv-only end-stations in a slower interval.
> 
> Because out-of-order submission of frames to LaunchTime hardware would
> result in the frame that is enqueued later but scheduled earlier only
> transmitting once the later-scheduled frame completes, enqueuing
> frames immediately with LaunchTime information results in even *worse*
> sloppiness if two applications don't always submit their frames in the
> correct order to the kernel during a cycle. If your queue is simply
> gated at the time of the first scheduled launch, the frame that should
> have been first will only be late by the length of the queued-first
> frame. If they both have precise launch times, the frame that should
> have been first will have to wait until the normal launch time of the
> later-scheduled frame plus the time it takes for it to egress!
> 
> In either case, re-ordering is required if it is not controlled for in
> userspace. In other words, I am not disagreeing with you about the
> insufficiency of a Qbv shaper for an endpoint! I am just pointing out
> that neither hardware LaunchTime support nor hardware Qbv window
> support are sufficient by themselves. Nor, for that case, is the CBS
> offload support sufficient to meet multi-stream requirements from SRP.
> All of these provide a low-layer mechanism by which appropriate
> scheduling of a multi-stream or multi-application endpoint can be
> enhanced, but none of them provide the complete story by themselves.
> 
> > I realize that we are only doing CBS for now. I guess my main point is
> that
> > if/when we add scheduling, it cannot be limited to Qbv. Per-stream
> scheduling
> > is essential, because that is the assumption for most CNC designs that
> > I'm aware of.
> 
> I understand and completely agree. I have believed that per-stream
> shaping/scheduling in some form is essential from the beginning.
> Likewise, CBS is not sufficient in itself for multi-stream systems in
> which multiple applications provide streams scheduled for the same
> traffic class. But CBS and SO_TXTIME are good first steps; they
> provides hooks for drivers to finally support the mechanisms provided
> by hardware, and also in the case of SO_TXTIME a hook on which further
> time-sensitive enhancements to the network stack can be hung, such as
> qdiscs that can store and release frames based on scheduled launch
> times. When these are in place, I think stream-based scheduling via
> qdisc would be a reasonable next step; perhaps a common design could
> be found to work both for media streams over CBS and industrial
> streams scheduled by launch time.
> 
> 
> Levi