netdev - Re: [RFC v3 net-next 13/18] net/sched: Introduce the TBS Qdisc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.21.1803280808490.3247@nanos.tec.linutronix.de>
Date:   Wed, 28 Mar 2018 09:48:05 +0200 (CEST)
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Jesus Sanchez-Palencia <jesus.sanchez-palencia@...el.com>
cc:     netdev@...r.kernel.org, jhs@...atatu.com, xiyou.wangcong@...il.com,
        jiri@...nulli.us, vinicius.gomes@...el.com,
        richardcochran@...il.com, anna-maria@...utronix.de,
        henrik@...tad.us, John Stultz <john.stultz@...aro.org>,
        levi.pearson@...man.com, edumazet@...gle.com, willemb@...gle.com,
        mlichvar@...hat.com
Subject: Re: [RFC v3 net-next 13/18] net/sched: Introduce the TBS Qdisc

Jesus,

On Tue, 27 Mar 2018, Jesus Sanchez-Palencia wrote:
> On 03/25/2018 04:46 AM, Thomas Gleixner wrote:
> >   This is missing right now and you want to get that right from the very
> >   beginning. Duct taping it on the interface later on is a bad idea.
> 
> Agreed that this is needed. On the SO_TXTIME + tbs proposal, I believe it's been
> covered by the (per-packet) SCM_DROP_IF_LATE. Do you think we need a different
> mechanism for expressing that?

Uuurgh. No. DROP_IF_LATE is just crap to be honest.

There are two modes:

      1) Send at the given TX time (Explicit mode)

      2) Send before given TX time (Deadline mode)

There is no need to specify 'drop if late' simply because if the message is
handed in past the given TX time, it's too late by definition. What you are
trying to implement is a hybrid of TSN and general purpose (not time aware)
networking in one go. And you do that because your overall design is not
looking at the big picture. You designed from a given use case assumption
and tried to fit other things into it with duct tape.

> >   So you really want a way for the application to query the timing
> >   constraints and perhaps other properties of the channel it connects
> >   to. And you want that now before the first application starts to use the
> >   new ABI. If the application developer does not use it, you still have to
> >   fix the application, but you have to fix it because the developer was a
> >   lazy bastard and not because the design was bad. That's a major
> >   difference.
> 
> Ok, this is something that we have considered in the past, but then the feedback
> here drove us onto a different direction. The overall input we got here was that
> applications would have to be adjusted or that userspace would have to handle
> the coordination between applications somehow (e.g.: a daemon could be developed
> separately to accommodate the fully dynamic use-cases, etc).

The only thing which will happen is that you get applications which require
to control the full interface themself because they are so important and
the only ones which get it right. Good luck with fixing them up.

That extra daemon if it ever surfaces will be just a PITA. Think about
20khz control loops. Do you really want queueing, locking, several context
switches and priority configuration nightmares in such a scenario?
Definitely not! You want a fast channel directly to the root qdisc which
takes care of getting it out at the right point, which might be immediate
handover if the adapter supports hw scheduling.

> This is a new requirement for the entire discussion.
> 
> If I'm not missing anything, however, underutilization of the time slots is only
> a problem:
> 
> 1) for the fully dynamic use-cases and;
> 2) because now you are designing applications in terms of time slices, right?

No. It's a general problem. I'm not designing applications in terms of time
slices. Time slices are a fundamental property of TSN. Whether you use them
for explicit scheduling or bandwidth reservation or make them flat does not
matter.

The application does not necessarily need to know about the time
constraints at all. But if it wants to use timed scheduling then it better
does know about them.

> We have not thought of making any of the proposed qdiscs capable of (optionally)
> adjusting the "time slices", but mainly because this is not a problem we had
> here before. Our assumption was that per-port Tx schedules would only be used
> for static systems. In other words, no, we didn't think that re-balancing the
> slots was a requirement, not even for 'taprio'.

Sigh. Utilization is not something entirely new in the network space. I'm
not saying that this needs to be implemented right away, but designing it
in a way which forces underutilization is just wrong.

> > Coming back to the overall scheme. If you start upfront with a time slice
> > manager which is designed to:
> >
> >   - Handle multiple channels
> >
> >   - Expose the time constraints, properties per channel
> >
> > then you can fit all kind of use cases, whether designed by committee or
> > not. You can configure that thing per node or network wide. It does not
> > make a difference. The only difference are the resulting constraints.
> 
>
> Ok, and I believe the above was covered by what we had proposed before, unless
> what you meant by time constraints is beyond the configured port schedule.
>
> Are you suggesting that we'll need to have a kernel entity that is not only
> aware of the current traffic classes 'schedule', but also of the resources that
> are still available for new streams to be accommodated into the classes? Putting
> it differently, is the TAS you envision just an entity that runs a schedule, or
> is it a time-aware 'orchestrator'?

In the first place its something which runs a defined schedule.

The accomodation for new streams is required, but not necessarily at the
root qdisc level. That might be a qdisc feeding into it.

Assume you have a bandwidth reservation, aka time slot, for audio. If your
audio related qdisc does deadline scheduling then you can add new streams
to it up to the point where it's not longer able to fit.

The only thing which might be needed at the root qdisc is the ability to
utilize unused time slots for other purposes, but that's not required to be
there in the first place as long as its designed in a way that it can be
added later on.

> > So lets look once more at the picture in an abstract way:
> >
> >      	       [ NIC ]
> > 	          |
> > 	 [ Time slice manager ]
> > 	    |           |
> >          [ Ch 0 ] ... [ Ch N ]
> >
> > So you have a bunch of properties here:
> >
> > 1) Number of Channels ranging from 1 to N
> >
> > 2) Start point, slice period and slice length per channel
> 
> Ok, so we agree that a TAS entity is needed. Assuming that channels are traffic
> classes, do you have something else in mind other than a new root qdisc?

Whatever you call it, the important point is that it is the gate keeper to
the network adapter and there is no way around it. It fully controls the
timed schedule how simple or how complex it may be.

> > 3) Queueing modes assigned per channel. Again that might be anything from
> >    'feed through' over FIFO, PRIO to more complex things like EDF.
> >
> >    The queueing mode can also influence properties like the meaning of the
> >    TX time, i.e. strict or deadline.
> 
> 
> Ok, but how are the queueing modes assigned / configured per channel?
> 
> Just to make sure we re-visit some ideas from the past:
> 
> * TAS:
> 
>    The idea we are currently exploring is to add a "time-aware", priority based
>    qdisc, that also exposes the Tx queues available and provides a mechanism for
>    mapping priority <-> traffic class <-> Tx queues in a similar fashion as
>    mqprio. We are calling this qdisc 'taprio', and its 'tc' cmd line would be:
> 
>    $ $ tc qdisc add dev ens4 parent root handle 100 taprio num_tc 4    \
>      	   map 2 2 1 0 3 3 3 3 3 3 3 3 3 3 3 3                         \
> 	   queues 0 1 2 3                                              \
>      	   sched-file gates.sched [base-time <interval>]               \
>            [cycle-time <interval>] [extension-time <interval>]
> 
>    <file> is multi-line, with each line being of the following format:
>    <cmd> <gate mask> <interval in nanoseconds>
> 
>    Qbv only defines one <cmd>: "S" for 'SetGates'
> 
>    For example:
> 
>    S 0x01 300
>    S 0x03 500
> 
>    This means that there are two intervals, the first will have the gate
>    for traffic class 0 open for 300 nanoseconds, the second will have
>    both traffic classes open for 500 nanoseconds.

To accomodate stuff like control systems you also need a base line, which
is not expressed as interval. Otherwise you can't schedule network wide
explicit plans. That's either an absolute network-time (TAI) time stamp or
an offset to a well defined network-time (TAI) time stamp, e.g. start of
epoch or something else which is agreed on. The actual schedule then fast
forwards past now (TAI) and sets up the slots from there. That makes node
hotplug possible as well.

Btw, it's not only control systems. Think about complex multi source A/V
streams. They are reality in recording and life mixing and looking at the
timing constraints of such scenarios, collision avoidance is key there. So
you want to be able to do network wide traffic orchestration.

> It would handle multiple channels and expose their constraints / properties.
> Each channel also becomes a traffic class, so other qdiscs can be attached to
> them separately.

Right.

> So, in summary, because our entire design is based on qdisc interfaces, what we
> had proposed was a root qdisc (the time slice manager, as you put) that allows
> for other qdiscs to be attached to each channel. The inner qdiscs define the
> queueing modes for each channel, and tbs is just one of those modes. I
> understand now that you want to allow for fully dynamic use-cases to be
> supported as well, which we hadn't covered with our TAS proposal before because
> we hadn't envisioned it being used for these systems' design.

Yes, you have the root qdisc, which is in charge of the overall scheduling
plan, how complex or not it is defined does not matter. It exposes traffic
classes which have properties defined by the configuration.

The qdiscs which are attached to those traffic classes can be anything
including:

 - Simple feed through (Applications are time contraints aware and set the
   exact schedule). qdisc has admission control.

 - Deadline aware qdisc to handle e.g. A/V streams. Applications are aware
   of time constraints and provide the packet deadline. qdisc has admission
   control. This can be a simple first comes, first served scheduler or
   something like EDF which allows optimized utilization. The qdisc sets
   the TX time depending on the deadline and feeds into the root.

 - FIFO/PRIO/XXX for general traffic. Applications do not know anything
   about timing constraints. These qdiscs obviously have neither admission
   control nor do they set a TX time.  The root qdisc just pulls from there
   when the assigned time slot is due or if it (optionally) decides to use
   underutilized time slots from other classes.

 - .... Add your favourite scheduling mode(s).

Thanks,

	tglx