[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7c3f5a9f-cc16-8483-cb77-b5548d46cd5b@intel.com>
Date: Thu, 22 Mar 2018 13:29:00 -0700
From: Jesus Sanchez-Palencia <jesus.sanchez-palencia@...el.com>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: netdev@...r.kernel.org, jhs@...atatu.com, xiyou.wangcong@...il.com,
jiri@...nulli.us, vinicius.gomes@...el.com,
richardcochran@...il.com, anna-maria@...utronix.de,
henrik@...tad.us, john.stultz@...aro.org, levi.pearson@...man.com,
edumazet@...gle.com, willemb@...gle.com, mlichvar@...hat.com
Subject: Re: [RFC v3 net-next 13/18] net/sched: Introduce the TBS Qdisc
Hi Thomas,
On 03/21/2018 06:46 AM, Thomas Gleixner wrote:
> On Tue, 6 Mar 2018, Jesus Sanchez-Palencia wrote:
>> +struct tbs_sched_data {
>> + bool sorting;
>> + int clockid;
>> + int queue;
>> + s32 delta; /* in ns */
>> + ktime_t last; /* The txtime of the last skb sent to the netdevice. */
>> + struct rb_root head;
>
> Hmm. You are reimplementing timerqueue open coded. Have you checked whether
> you could reuse the timerqueue implementation?
>
> That requires to add a timerqueue node to struct skbuff
>
> @@ -671,7 +671,8 @@ struct sk_buff {
> unsigned long dev_scratch;
> };
> };
> - struct rb_node rbnode; /* used in netem & tcp stack */
> + struct rb_node rbnode; /* used in netem & tcp stack */
> + struct timerqueue_node tqnode;
> };
> struct sock *sk;
>
> Then you can use timerqueue_head in your scheduler data and all the open
> coded rbtree handling goes away.
Yes, you are right. We actually looked into that for the first prototype of this
qdisc but we weren't so sure about adding the timerqueue node to the sk_buff's
union and whether it would impact the other usages here, but looking again now
and it looks fine.
We'll fix for the next version, thanks.
>
>> +static bool is_packet_valid(struct Qdisc *sch, struct sk_buff *nskb)
>> +{
>> + struct tbs_sched_data *q = qdisc_priv(sch);
>> + ktime_t txtime = nskb->tstamp;
>> + struct sock *sk = nskb->sk;
>> + ktime_t now;
>> +
>> + if (sk && !sock_flag(sk, SOCK_TXTIME))
>> + return false;
>> +
>> + /* We don't perform crosstimestamping.
>> + * Drop if packet's clockid differs from qdisc's.
>> + */
>> + if (nskb->txtime_clockid != q->clockid)
>> + return false;
>> +
>> + now = get_time_by_clockid(q->clockid);
>
> If you store the time getter function pointer in tbs_sched_data then you
> avoid the lookup and just can do
>
> now = q->get_time();
>
> That applies to lots of other places.
Good idea, thanks. Will fix.
>> +
>> +static struct sk_buff *tbs_peek_timesortedlist(struct Qdisc *sch)
>> +{
>> + struct tbs_sched_data *q = qdisc_priv(sch);
>> + struct rb_node *p;
>> +
>> + p = rb_first(&q->head);
>
> timerqueue gives you direct access to the first expiring entry w/o walking
> the rbtree. So that would become:
>
> p = timerqueue_getnext(&q->tqhead);
> return p ? rb_to_skb(p) : NULL;
OK.
(...)
>> +static struct sk_buff *tbs_dequeue_scheduledfifo(struct Qdisc *sch)
>> +{
>> + struct tbs_sched_data *q = qdisc_priv(sch);
>> + struct sk_buff *skb = tbs_peek(sch);
>> + ktime_t now, next;
>> +
>> + if (!skb)
>> + return NULL;
>> +
>> + now = get_time_by_clockid(q->clockid);
>> +
>> + /* Drop if packet has expired while in queue and the drop_if_late
>> + * flag is set.
>> + */
>> + if (skb->tc_drop_if_late && ktime_before(skb->tstamp, now)) {
>> + struct sk_buff *to_free = NULL;
>> +
>> + qdisc_queue_drop_head(sch, &to_free);
>> + kfree_skb_list(to_free);
>> + qdisc_qstats_overlimit(sch);
>> +
>> + skb = NULL;
>> + goto out;
>
> Instead of going out immediately you should check the next skb whether its
> due for sending already.
We wanted to have a baseline before starting with the optimizations, so we left
this for a later patchset. It was one of the opens we had listed on the v2 cover
letter IIRC, but we'll look into it.
(...)
>> + }
>> +
>> + next = ktime_sub_ns(skb->tstamp, q->delta);
>> +
>> + /* Dequeue only if now is within the [txtime - delta, txtime] range. */
>> + if (ktime_after(now, next))
>> + timesortedlist_erase(sch, skb, false);
>> + else
>> + skb = NULL;
>> +
>> +out:
>> + /* Now we may need to re-arm the qdisc watchdog for the next packet. */
>> + reset_watchdog(sch);
>> +
>> + return skb;
>> +}
>> +
>> +static inline void setup_queueing_mode(struct tbs_sched_data *q)
>> +{
>> + if (q->sorting) {
>> + q->enqueue = tbs_enqueue_timesortedlist;
>> + q->dequeue = tbs_dequeue_timesortedlist;
>> + q->peek = tbs_peek_timesortedlist;
>> + } else {
>> + q->enqueue = tbs_enqueue_scheduledfifo;
>> + q->dequeue = tbs_dequeue_scheduledfifo;
>> + q->peek = qdisc_peek_head;
>
> I don't see the point of these two modes and all the duplicated code it
> involves.
>
> FIFO mode limits usage to a single thread which has to guarantee that the
> packets are queued in time order.
>
> If you look at the use cases of TDM in various fields then FIFO mode is
> pretty much useless. In industrial/automotive fieldbus applications the
> various time slices are filled by different threads or even processes.
>
> Sure, the rbtree queue/dequeue has overhead compared to a simple linked
> list, but you pay for that with more indirections and lots of mostly
> duplicated code. And in the worst case one of these code pathes is going to
> be rarely used and prone to bitrot.
Our initial version (on RFC v2) was performing the sorting for all modes. After
all the feedback we got we decided to make it optional and provide FIFO modes as
well. For the SW fallback we need the scheduled FIFO, and for "pure" hw offload
we need the "raw" FIFO.
This was a way to accommodate all the use cases without imposing too much of a
burden onto anyone, regardless of their application's segment (i.e. industrial,
pro a/v, automotive, etc).
Having the sorting always enabled requires that a valid static clockid is passed
to the qdisc. For the hw offload mode, that means that the PHC and one of the
system clocks must be synchronized since hrtimers do not support dynamic clocks.
Not all systems do that or want to, and given that we do not want to perform
crosstimestamping between the packets' clock reference and the qdisc's one, the
only solution for these systems would be using the raw hw offload mode.
Thanks,
Jesus
Powered by blists - more mailing lists