netdev - Re: [RFC PATCH net] net/sched: taprio: account for L1 overhead when calculating transmit time

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Date:   Thu, 5 May 2022 19:22:08 +0000
From:   Vladimir Oltean <vladimir.oltean@....com>
To:     Vinicius Costa Gomes <vinicius.gomes@...el.com>
CC:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Jakub Kicinski <kuba@...nel.org>,
        "David S. Miller" <davem@...emloft.net>,
        Paolo Abeni <pabeni@...hat.com>,
        Eric Dumazet <edumazet@...gle.com>,
        Kurt Kanzenbach <kurt@...utronix.de>,
        Yannick Vignon <yannick.vignon@....com>,
        Michael Walle <michael@...le.cc>
Subject: Re: [RFC PATCH net] net/sched: taprio: account for L1 overhead when
 calculating transmit time

On Thu, May 05, 2022 at 10:25:44AM -0700, Vinicius Costa Gomes wrote:
> Hi Vladimir,
> 
> Vladimir Oltean <vladimir.oltean@....com> writes:
> 
> > The taprio scheduler underestimates the packet transmission time, which
> > means that packets can be scheduled for transmission in time slots in
> > which they are never going to fit.
> >
> > When this function was added in commit 4cfd5779bd6e ("taprio: Add
> > support for txtime-assist mode"), the only implication was that time
> > triggered packets would overrun its time slot and eat from the next one,
> > because with txtime-assist there isn't really any emulation of a "gate
> > close" event that would stop a packet from being transmitted.
> >
> > However, commit b5b73b26b3ca ("taprio: Fix allowing too small
> > intervals") started using this function too, in all modes of operation
> > (software, txtime-assist and full offload). So we now accept time slots
> > which we know we won't be ever able to fulfill.
> >
> > It's difficult to say which issue is more pressing, I'd say both are
> > visible with testing, even though the second would be more obvious
> > because of a black&white result (trying to send small packets in an
> > insufficiently large window blocks the queue).
> >
> > Issue found through code inspection, the code was not even compile
> > tested.
> >
> > The L1 overhead chosen here is an approximation, because various network
> > equipment has configurable IFG, however I don't think Linux is aware of
> > this.
> 
> When testing CBS, I remember using tc-stab: 
> 
> https://man7.org/linux/man-pages/man8/tc-stab.8.html
> 
> To set the 'overhead' to some value.
> 
> That value should be used in the calculation.
> 
> I agree that it's not ideal, in the ideal world we would have a way to
> retrieve the link overhead from the netdevice. But I would think that it
> gets complicated really quickly when using netdevices that are not
> Ethernet-based.

Interesting. So because the majority of length_to_duration() calls take
qdisc_pkt_len(skb) as argument, a user-supplied overhead is taken into
account. The exception is the bare ETH_ZLEN. For that, we'd have to
change the prototype of __qdisc_calculate_pkt_len to return an int, and
change qdisc_calculate_pkt_len like this:

static inline void qdisc_calculate_pkt_len(struct sk_buff *skb,
					   const struct Qdisc *sch)
{
#ifdef CONFIG_NET_SCHED
	struct qdisc_size_table *stab = rcu_dereference_bh(sch->stab);

	if (stab)
		qdisc_skb_cb(skb)->pkt_len = __qdisc_calculate_pkt_len(skb->len, stab);
#endif
}

then we would use __qdisc_calculate_pkt_len(ETH_ZLEN, rtnl_dereference(q->root->stab)).
Again completely untested.

Also, maybe the dependency on tc-stab for correct operation at least in
txtime assist mode should be mentioned in the man page, maybe? I don't
think it's completely obvious.