[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <VI1PR10MB24469F42655B66B16DF25B6DABCA0@VI1PR10MB2446.EURPRD10.PROD.OUTLOOK.COM>
Date: Fri, 11 Dec 2020 14:44:21 +0000
From: "Geva, Erez" <erez.geva.ext@...mens.com>
To: Vinicius Costa Gomes <vinicius.gomes@...el.com>,
Willem de Bruijn <willemdebruijn.kernel@...il.com>
CC: Network Development <netdev@...r.kernel.org>,
linux-kernel <linux-kernel@...r.kernel.org>,
"linux-arch@...r.kernel.org" <linux-arch@...r.kernel.org>,
Alexey Kuznetsov <kuznet@....inr.ac.ru>,
Arnd Bergmann <arnd@...db.de>,
Cong Wang <xiyou.wangcong@...il.com>,
"David S . Miller" <davem@...emloft.net>,
Hideaki YOSHIFUJI <yoshfuji@...ux-ipv6.org>,
Jakub Kicinski <kuba@...nel.org>,
Jamal Hadi Salim <jhs@...atatu.com>,
Jiri Pirko <jiri@...nulli.us>,
Alexei Starovoitov <ast@...nel.org>,
Colin Ian King <colin.king@...onical.com>,
Daniel Borkmann <daniel@...earbox.net>,
Eric Dumazet <edumazet@...gle.com>,
Eyal Birger <eyal.birger@...il.com>,
"Gustavo A . R . Silva" <gustavoars@...nel.org>,
Jakub Sitnicki <jakub@...udflare.com>,
John Ogness <john.ogness@...utronix.de>,
Jon Rosen <jrosen@...co.com>,
Kees Cook <keescook@...omium.org>,
Marc Kleine-Budde <mkl@...gutronix.de>,
Martin KaFai Lau <kafai@...com>,
Matthieu Baerts <matthieu.baerts@...sares.net>,
Andrei Vagin <avagin@...il.com>,
Dmitry Safonov <0x7f454c46@...il.com>,
"Eric W . Biederman" <ebiederm@...ssion.com>,
Ingo Molnar <mingo@...nel.org>,
John Stultz <john.stultz@...aro.org>,
Miaohe Lin <linmiaohe@...wei.com>,
Michal Kubecek <mkubecek@...e.cz>,
Or Cohen <orcohen@...oaltonetworks.com>,
Oleg Nesterov <oleg@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Richard Cochran <richardcochran@...il.com>,
Stefan Schmidt <stefan@...enfreihafen.org>,
Xie He <xie.he.0141@...il.com>,
Stephen Boyd <sboyd@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Vladis Dronov <vdronov@...hat.com>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
Frederic Weisbecker <frederic@...nel.org>,
Vedang Patel <vedang.patel@...el.com>,
"Sudler, Simon" <simon.sudler@...mens.com>,
"Meisinger, Andreas" <andreas.meisinger@...mens.com>,
"henning.schild@...mens.com" <henning.schild@...mens.com>,
"jan.kiszka@...mens.com" <jan.kiszka@...mens.com>,
"Zirkler, Andreas" <andreas.zirkler@...mens.com>
Subject: Re: [PATCH 1/3] Add TX sending hardware timestamp.
On 11/12/2020 01:27, Vinicius Costa Gomes wrote:
> Willem de Bruijn <willemdebruijn.kernel@...il.com> writes:
>
>>>> If I understand correctly, you are trying to achieve a single delivery time.
>>>> The need for two separate timestamps passed along is only because the
>>>> kernel is unable to do the time base conversion.
>>>
>>> Yes, a correct point.
>>>
>>>>
>>>> Else, ETF could program the qdisc watchdog in system time and later,
>>>> on dequeue, convert skb->tstamp to the h/w time base before
>>>> passing it to the device.
>>>
>>> Or the skb->tstamp is HW time-stamp and the ETF convert it to system clock based.
>>>
>>>>
>>>> It's still not entirely clear to me why the packet has to be held by
>>>> ETF initially first, if it is held until delivery time by hardware
>>>> later. But more on that below.
>>>
>>> Let plot a simple scenario.
>>> App A send a packet with time-stamp 100.
>>> After arrive a second packet from App B with time-stamp 90.
>>> Without ETF, the second packet will have to wait till the interface hardware send the first packet on 100.
>>> Making the second packet late by 10 + first packet send time.
>>> Obviously other "normal" packets are send to the non-ETF queue, though they do not block ETF packets
>>> The ETF delta is a barrier that the application have to send the packet before to ensure the packet do not tossed.
>>
>> Got it. The assumption here is that devices are FIFO. That is not
>> necessarily the case, but I do not know whether it is in practice,
>> e.g., on the i210.
>
> On the i210 and i225, that's indeed the case, i.e. only the launch time
> of the packet at the front of the queue is considered.
>
> [...]
>
>>>>>>>> It only requires that pacing qdiscs, both sch_etf and sch_fq,
>>>>>>>> optionally skip queuing in their .enqueue callback and instead allow
>>>>>>>> the skb to pass to the device driver as is, with skb->tstamp set. Only
>>>>>>>> to devices that advertise support for h/w pacing offload.
>>>>>>>>
>>>>>>> I did not use "Fair Queue traffic policing".
>>>>>>> As for ETF, it is all about ordering packets from different applications.
>>>>>>> How can we achive it with skiping queuing?
>>>>>>> Could you elaborate on this point?
>>>>>>
>>>>>> The qdisc can only defer pacing to hardware if hardware can ensure the
>>>>>> same invariants on ordering, of course.
>>>>>
>>>>> Yes, this is why we suggest ETF order packets using the hardware time-stamp.
>>>>> And pass the packet based on system time.
>>>>> So ETF query the system clock only and not the PHC.
>>>>
>>>> On which note: with this patch set all applications have to agree to
>>>> use h/w time base in etf_enqueue_timesortedlist. In practice that
>>>> makes this h/w mode a qdisc used by a single process?
>>>
>>> A single process theoretically does not need ETF, just set the skb-> tstamp and use a pass through queue.
>>> However the only way now to set TC_SETUP_QDISC_ETF in the driver is using ETF.
>>
>> Yes, and I'd like to eventually get rid of this constraint.
>>
>
> I'm interested in these kind of ideas :-)
>
> What would be your end goal? Something like:
> - Any application is able to set SO_TXTIME;
> - We would have a best effort support for scheduling packets based on
> their transmission time enabled by default;
> - If the hardware supports, there would be a "offload" flag that could
> be enabled;
>
> More or less this?
Activate the SO_TXTIME is what cause the SKB to enter the matching ETF QDISC.
If the ETF QDISC is not set the SKB will pass directly to the driver.
Or if the SO_TXTIME Clock ID is not TAI.
So application can use the SO_TXTIME as is and set the skb-> tstamp.
No need to change anything for SO_TXTIME.
As for setting TC_SETUP_QDISC_ETF on a driver queue.
We can add net-link message using the net-link protocol.
How about other TC_SETUP_QDISC_XXX like CBS?
>
>
> Cheers.
>
Powered by blists - more mailing lists