lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 9 Jul 2018 08:24:21 -0700
From:   Jesus Sanchez-Palencia <jesus.sanchez-palencia@...el.com>
To:     Stephen Hemminger <stephen@...workplumber.org>
Cc:     netdev@...r.kernel.org, tglx@...utronix.de,
        jan.altenberg@...utronix.de, vinicius.gomes@...el.com,
        kurt.kanzenbach@...utronix.de, henrik@...tad.us,
        richardcochran@...il.com, ilias.apalodimas@...aro.org,
        ivan.khoronzhuk@...aro.org, mlichvar@...hat.com,
        willemb@...gle.com, jhs@...atatu.com, xiyou.wangcong@...il.com,
        jiri@...nulli.us, eric.dumazet@...il.com,
        jeffrey.t.kirsher@...el.com
Subject: Re: [PATCH v2 net-next 00/14] Scheduled packet Transmission: ETF

Hi Stephen,


On 07/06/2018 02:38 PM, Stephen Hemminger wrote:
> On Tue,  3 Jul 2018 15:42:46 -0700
> Jesus Sanchez-Palencia <jesus.sanchez-palencia@...el.com> wrote:
> 
>> Changes since v1:
>>   - moved struct sock_txtime from socket.h to uapi net_tstamp.h;
>>   - sk_clockid was changed from u16 to u8;
>>   - sk_txtime_flags was changed from u16 to a u8 bit field in struct sock;
>>   - the socket option flags are now validated in sock_setsockopt();
>>   - added SO_EE_ORIGIN_TXTIME;
>>   - sockc.transmit_time is now initialized from all IPv4 Tx paths;
>>   - added support for the IPv6 Tx path;
>>
>>
>> Overview
>> ========
>>
>> This work consists of a set of kernel interfaces that can be used by
>> applications that require (time-based) Scheduled Tx of packets.
>> It is comprised by 3 new components to the kernel:
>>
>>   - SO_TXTIME: socket option + cmsg programming interfaces.
>>
>>   - etf: the "earliest txtime first" qdisc, that provides per-queue
>> 	 TxTime-based scheduling. This has been renamed from 'tbs' to
>> 	 'etf' to better describe its functionality.
>>
>>   - taprio: the "time-aware priority scheduler" qdisc, that provides
>> 	    per-port Time-Aware scheduling;
>>
>> This patchset is providing the first 2 components, which have been
>> developed for longer. The taprio qdisc will be shared as an RFC separately
>> (shortly).
>>
>> Note that this series is a follow up of the "Time based packet
>> transmission" RFCv3 [1].
>>
>>
>>
>> etf (formerly known as 'tbs')
>> =============================
>>
>> For applications/systems that the concept of time slices isn't precise
>> enough, the etf qdisc allows applications to control the instant when
>> a packet should leave the network controller. When used in conjunction
>> with taprio, it can also be used in case the application needs to
>> control with greater guarantee the offset into each time slice a packet
>> will be sent. Another use case of etf, is when only a small number of
>> applications on a system are time sensitive, so it can then be used
>> with a more traditional root qdisc (like mqprio).
>>
>> The etf qdisc is designed so it buffers packets until a configurable
>> time before their deadline (Tx time). The qdisc uses a rbtree internally
>> so the buffered packets are always 'ordered' by their txtime (deadline)
>> and will be dequeued following the earliest txtime first.
>>
>> It relies on the SO_TXTIME API set for receiving the per-packet timestamp
>> (txtime) as well as the config flags for each socket: the clockid to be
>> used as a reference, if the expected mode of txtime for that socket is
>> deadline or strict mode, and if packet drops should be reported on the
>> socket's error queue or not.
>>
>> The qdisc will drop any packets with a Tx time in the past, or if a
>> packet expires while waiting for being dequeued. Drops can be reported
>> as errors back to userspace through the socket's error queue.
>>
>> Example configuration:
>>
>> $ tc qdisc add dev enp2s0 parent 100:1 etf offload delta 200000 \
>>             clockid CLOCK_TAI
>>
>> Here, the Qdisc will use HW offload for the txtime control.
>> Packets will be dequeued by the qdisc "delta" (200000) nanoseconds before
>> their transmission time. Because this will be using HW offload and
>> since dynamic clocks are not supported by hrtimers, the system clock
>> and the PHC clock must be synchronized for this mode to behave as expected.
>>
>> A more complete example can be found here, with instructions of how to
>> test it:
>>
>> https://gist.github.com/jeez/bd3afeff081ba64a695008dd8215866f [2]
>>
>>
>> Note that we haven't modified the qdisc so it uses a timerqueue because
>> the modification needed was increasing the number of cachelines of a sk_buff.
>>
>>
>>
>> This series is also hosted on github and can be found at [3].
>> The companion iproute2 patches can be found at [4].
>>
>>
>> [1] https://patchwork.ozlabs.org/cover/882342/
>>
>> [2] github doesn't make it clear, but the gist can be cloned like this:
>> $ git clone https://gist.github.com/jeez/bd3afeff081ba64a695008dd8215866f scheduled-tx-tests
>>
>> [3] https://github.com/jeez/linux/tree/etf-v2
>>
>> [4] https://github.com/jeez/iproute2/tree/etf-v2
>>
>>
>>
>> Jesus Sanchez-Palencia (10):
>>   net: Clear skb->tstamp only on the forwarding path
>>   net: ipv4: Hook into time based transmission
>>   net: ipv6: Hook into time based transmission
>>   net/sched: Add HW offloading capability to ETF
>>   igb: Refactor igb_configure_cbs()
>>   igb: Only change Tx arbitration when CBS is on
>>   igb: Refactor igb_offload_cbs()
>>   igb: Only call skb_tx_timestamp after descriptors are ready
>>   igb: Add support for ETF offload
>>   net/sched: Make etf report drops on error_queue
>>
>> Richard Cochran (2):
>>   net: Add a new socket option for a future transmit time.
>>   net: packet: Hook into time based transmission.
>>
>> Vinicius Costa Gomes (2):
>>   net/sched: Allow creating a Qdisc watchdog with other clocks
>>   net/sched: Introduce the ETF Qdisc
>>
>>  arch/alpha/include/uapi/asm/socket.h          |   3 +
>>  arch/ia64/include/uapi/asm/socket.h           |   3 +
>>  arch/mips/include/uapi/asm/socket.h           |   3 +
>>  arch/parisc/include/uapi/asm/socket.h         |   3 +
>>  arch/s390/include/uapi/asm/socket.h           |   3 +
>>  arch/sparc/include/uapi/asm/socket.h          |   3 +
>>  arch/xtensa/include/uapi/asm/socket.h         |   3 +
>>  .../net/ethernet/intel/igb/e1000_defines.h    |  16 +
>>  drivers/net/ethernet/intel/igb/igb.h          |   1 +
>>  drivers/net/ethernet/intel/igb/igb_main.c     | 256 ++++++---
>>  include/linux/netdevice.h                     |   1 +
>>  include/net/inet_sock.h                       |   1 +
>>  include/net/pkt_sched.h                       |   7 +
>>  include/net/sock.h                            |  11 +
>>  include/uapi/asm-generic/socket.h             |   3 +
>>  include/uapi/linux/errqueue.h                 |   4 +
>>  include/uapi/linux/net_tstamp.h               |  18 +
>>  include/uapi/linux/pkt_sched.h                |  18 +
>>  net/core/skbuff.c                             |   2 +-
>>  net/core/sock.c                               |  39 ++
>>  net/ipv4/icmp.c                               |   2 +
>>  net/ipv4/ip_output.c                          |   3 +
>>  net/ipv4/ping.c                               |   1 +
>>  net/ipv4/raw.c                                |   2 +
>>  net/ipv4/udp.c                                |   1 +
>>  net/ipv6/ip6_output.c                         |  11 +-
>>  net/ipv6/raw.c                                |   7 +-
>>  net/ipv6/udp.c                                |   1 +
>>  net/packet/af_packet.c                        |   6 +
>>  net/sched/Kconfig                             |  11 +
>>  net/sched/Makefile                            |   1 +
>>  net/sched/sch_api.c                           |  11 +-
>>  net/sched/sch_etf.c                           | 484 ++++++++++++++++++
>>  33 files changed, 864 insertions(+), 75 deletions(-)
>>  create mode 100644 net/sched/sch_etf.c
>>
> 
> Why support different clockid's in the API? 
> I think the clock used in API should be either nanoseconds or USER_HZ (ie 100)
> and the kernel components should use ktime. If you need to translate that to some
> other value in the hardware driver,  then let the device driver do it.
> 
> Exposing multiple choices in userspace API, leads to more error paths and does
> not provide direct benefits.


The kernel components already use ktime_t. The clockid_t here is to define the
time source (i.e. which clock must be used to read the ktime from) and not the
unit of time.

I hope that clarifies.

Regards,
Jesus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ