[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <da5d1a0c-ee26-9663-6092-e07b12692e31@intel.com>
Date: Mon, 9 Jul 2018 08:24:21 -0700
From: Jesus Sanchez-Palencia <jesus.sanchez-palencia@...el.com>
To: Stephen Hemminger <stephen@...workplumber.org>
Cc: netdev@...r.kernel.org, tglx@...utronix.de,
jan.altenberg@...utronix.de, vinicius.gomes@...el.com,
kurt.kanzenbach@...utronix.de, henrik@...tad.us,
richardcochran@...il.com, ilias.apalodimas@...aro.org,
ivan.khoronzhuk@...aro.org, mlichvar@...hat.com,
willemb@...gle.com, jhs@...atatu.com, xiyou.wangcong@...il.com,
jiri@...nulli.us, eric.dumazet@...il.com,
jeffrey.t.kirsher@...el.com
Subject: Re: [PATCH v2 net-next 00/14] Scheduled packet Transmission: ETF
Hi Stephen,
On 07/06/2018 02:38 PM, Stephen Hemminger wrote:
> On Tue, 3 Jul 2018 15:42:46 -0700
> Jesus Sanchez-Palencia <jesus.sanchez-palencia@...el.com> wrote:
>
>> Changes since v1:
>> - moved struct sock_txtime from socket.h to uapi net_tstamp.h;
>> - sk_clockid was changed from u16 to u8;
>> - sk_txtime_flags was changed from u16 to a u8 bit field in struct sock;
>> - the socket option flags are now validated in sock_setsockopt();
>> - added SO_EE_ORIGIN_TXTIME;
>> - sockc.transmit_time is now initialized from all IPv4 Tx paths;
>> - added support for the IPv6 Tx path;
>>
>>
>> Overview
>> ========
>>
>> This work consists of a set of kernel interfaces that can be used by
>> applications that require (time-based) Scheduled Tx of packets.
>> It is comprised by 3 new components to the kernel:
>>
>> - SO_TXTIME: socket option + cmsg programming interfaces.
>>
>> - etf: the "earliest txtime first" qdisc, that provides per-queue
>> TxTime-based scheduling. This has been renamed from 'tbs' to
>> 'etf' to better describe its functionality.
>>
>> - taprio: the "time-aware priority scheduler" qdisc, that provides
>> per-port Time-Aware scheduling;
>>
>> This patchset is providing the first 2 components, which have been
>> developed for longer. The taprio qdisc will be shared as an RFC separately
>> (shortly).
>>
>> Note that this series is a follow up of the "Time based packet
>> transmission" RFCv3 [1].
>>
>>
>>
>> etf (formerly known as 'tbs')
>> =============================
>>
>> For applications/systems that the concept of time slices isn't precise
>> enough, the etf qdisc allows applications to control the instant when
>> a packet should leave the network controller. When used in conjunction
>> with taprio, it can also be used in case the application needs to
>> control with greater guarantee the offset into each time slice a packet
>> will be sent. Another use case of etf, is when only a small number of
>> applications on a system are time sensitive, so it can then be used
>> with a more traditional root qdisc (like mqprio).
>>
>> The etf qdisc is designed so it buffers packets until a configurable
>> time before their deadline (Tx time). The qdisc uses a rbtree internally
>> so the buffered packets are always 'ordered' by their txtime (deadline)
>> and will be dequeued following the earliest txtime first.
>>
>> It relies on the SO_TXTIME API set for receiving the per-packet timestamp
>> (txtime) as well as the config flags for each socket: the clockid to be
>> used as a reference, if the expected mode of txtime for that socket is
>> deadline or strict mode, and if packet drops should be reported on the
>> socket's error queue or not.
>>
>> The qdisc will drop any packets with a Tx time in the past, or if a
>> packet expires while waiting for being dequeued. Drops can be reported
>> as errors back to userspace through the socket's error queue.
>>
>> Example configuration:
>>
>> $ tc qdisc add dev enp2s0 parent 100:1 etf offload delta 200000 \
>> clockid CLOCK_TAI
>>
>> Here, the Qdisc will use HW offload for the txtime control.
>> Packets will be dequeued by the qdisc "delta" (200000) nanoseconds before
>> their transmission time. Because this will be using HW offload and
>> since dynamic clocks are not supported by hrtimers, the system clock
>> and the PHC clock must be synchronized for this mode to behave as expected.
>>
>> A more complete example can be found here, with instructions of how to
>> test it:
>>
>> https://gist.github.com/jeez/bd3afeff081ba64a695008dd8215866f [2]
>>
>>
>> Note that we haven't modified the qdisc so it uses a timerqueue because
>> the modification needed was increasing the number of cachelines of a sk_buff.
>>
>>
>>
>> This series is also hosted on github and can be found at [3].
>> The companion iproute2 patches can be found at [4].
>>
>>
>> [1] https://patchwork.ozlabs.org/cover/882342/
>>
>> [2] github doesn't make it clear, but the gist can be cloned like this:
>> $ git clone https://gist.github.com/jeez/bd3afeff081ba64a695008dd8215866f scheduled-tx-tests
>>
>> [3] https://github.com/jeez/linux/tree/etf-v2
>>
>> [4] https://github.com/jeez/iproute2/tree/etf-v2
>>
>>
>>
>> Jesus Sanchez-Palencia (10):
>> net: Clear skb->tstamp only on the forwarding path
>> net: ipv4: Hook into time based transmission
>> net: ipv6: Hook into time based transmission
>> net/sched: Add HW offloading capability to ETF
>> igb: Refactor igb_configure_cbs()
>> igb: Only change Tx arbitration when CBS is on
>> igb: Refactor igb_offload_cbs()
>> igb: Only call skb_tx_timestamp after descriptors are ready
>> igb: Add support for ETF offload
>> net/sched: Make etf report drops on error_queue
>>
>> Richard Cochran (2):
>> net: Add a new socket option for a future transmit time.
>> net: packet: Hook into time based transmission.
>>
>> Vinicius Costa Gomes (2):
>> net/sched: Allow creating a Qdisc watchdog with other clocks
>> net/sched: Introduce the ETF Qdisc
>>
>> arch/alpha/include/uapi/asm/socket.h | 3 +
>> arch/ia64/include/uapi/asm/socket.h | 3 +
>> arch/mips/include/uapi/asm/socket.h | 3 +
>> arch/parisc/include/uapi/asm/socket.h | 3 +
>> arch/s390/include/uapi/asm/socket.h | 3 +
>> arch/sparc/include/uapi/asm/socket.h | 3 +
>> arch/xtensa/include/uapi/asm/socket.h | 3 +
>> .../net/ethernet/intel/igb/e1000_defines.h | 16 +
>> drivers/net/ethernet/intel/igb/igb.h | 1 +
>> drivers/net/ethernet/intel/igb/igb_main.c | 256 ++++++---
>> include/linux/netdevice.h | 1 +
>> include/net/inet_sock.h | 1 +
>> include/net/pkt_sched.h | 7 +
>> include/net/sock.h | 11 +
>> include/uapi/asm-generic/socket.h | 3 +
>> include/uapi/linux/errqueue.h | 4 +
>> include/uapi/linux/net_tstamp.h | 18 +
>> include/uapi/linux/pkt_sched.h | 18 +
>> net/core/skbuff.c | 2 +-
>> net/core/sock.c | 39 ++
>> net/ipv4/icmp.c | 2 +
>> net/ipv4/ip_output.c | 3 +
>> net/ipv4/ping.c | 1 +
>> net/ipv4/raw.c | 2 +
>> net/ipv4/udp.c | 1 +
>> net/ipv6/ip6_output.c | 11 +-
>> net/ipv6/raw.c | 7 +-
>> net/ipv6/udp.c | 1 +
>> net/packet/af_packet.c | 6 +
>> net/sched/Kconfig | 11 +
>> net/sched/Makefile | 1 +
>> net/sched/sch_api.c | 11 +-
>> net/sched/sch_etf.c | 484 ++++++++++++++++++
>> 33 files changed, 864 insertions(+), 75 deletions(-)
>> create mode 100644 net/sched/sch_etf.c
>>
>
> Why support different clockid's in the API?
> I think the clock used in API should be either nanoseconds or USER_HZ (ie 100)
> and the kernel components should use ktime. If you need to translate that to some
> other value in the hardware driver, then let the device driver do it.
>
> Exposing multiple choices in userspace API, leads to more error paths and does
> not provide direct benefits.
The kernel components already use ktime_t. The clockid_t here is to define the
time source (i.e. which clock must be used to read the ktime from) and not the
unit of time.
I hope that clarifies.
Regards,
Jesus
Powered by blists - more mailing lists