[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20210430142635.3791-1-alexandr.lobakin@intel.com>
Date: Fri, 30 Apr 2021 16:26:35 +0200
From: Alexander Lobakin <alexandr.lobakin@...el.com>
To: Jesper Dangaard Brouer <brouer@...hat.com>
Cc: Alexander Lobakin <alexandr.lobakin@...el.com>,
"Ong, Boon Leong" <boon.leong.ong@...el.com>,
"Joseph, Jithu" <jithu.joseph@...el.com>,
"Desouza, Ederson" <ederson.desouza@...el.com>,
"Song, Yoong Siang" <yoong.siang.song@...el.com>,
Lorenzo Bianconi <lorenzo.bianconi@...hat.com>,
"Gomes, Vinicius" <vinicius.gomes@...el.com>,
"Machnikowski, Maciej" <maciej.machnikowski@...el.com>,
Maciej Fijalkowski <maciej.fijalkowski@...el.com>,
David Ahern <dsahern@...il.com>,
"Karlsson, Magnus" <magnus.karlsson@...el.com>,
Saeed Mahameed <saeed@...nel.org>,
Björn Töpel <bjorn.topel@...il.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
Alexei Starovoitov <alexei.starovoitov@...il.com>,
Jens Steen Krogh <jskro@...tas.com>,
Joao Pedro Barros Silva <jopbs@...tas.com>,
bpf@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: PTP RX & TX time-stamp and TX Time in XDP ZC socket
From: Jesper Dangaard Brouer <brouer@...hat.com>
Date: Fri, 23 Apr 2021 18:37:31 +0200
> Cc, netdev, as I think we get upstream feedback as early as possible.
> (Maybe Alexei will be critique my idea of storing btf_id in struct?)
>
>
> On Thu, 22 Apr 2021 07:34:23 +0000
> "Ong, Boon Leong" <boon.leong.ong@...el.com> wrote:
>
> > >> Now that stmmac driver has been added with XDP ZC, we would like
> > >> to know if there is any on-going POC or discussion on XDP ZC
> > >> socket for adding below:
> > >>
> > >> 1) PTP RX & TX time-stamp
> > >> 2) Per-packet TX transmit time (similar to SO_TXTIME)
> > >
> > > Well, this is actually perfect timing! (pun intended)
> > >
> > > I'm actually going to work on adding this to XDP. I was reading igc
> > > driver and i225 sw datasheet last night, trying to figure out a design
> > > based on what hardware can do. My design ideas obviously involve BTF,
> > > but a lot of missing pieces like an XDP TX hook is also missing.
> >
> > Currently, we are using a non-standard/not elegant way to provide for
> > internal real-time KPI measurement purpose as follow
> >
> > 1) TX time stored in a newly introduced 64-bit timestamp in XDP descriptor.
>
> Did you create a separate XDP descriptor?
> If so what memory is backing that?
>
> My idea[1] is to use the meta-data area (xdp_buff->data_meta), that is
> located in-front of the packet headers. Or the area in top of the
> "packet" memory, which is already used by struct xdp_frame, except that
> zero-copy AF_XDP don't have the xdp_frame. Due to AF_XDP limits I'm
> leaning towards using xdp_buff->data_meta area.
>
> [1] https://people.netfilter.org/hawk/presentations/KernelRecipes2019/xdp-netstack-concert.pdf
>
> I should mention that I want a generic solution (based on BTF), that can
> support many types of hardware hints. Like existing RX-hash, VLAN,
> checksum, mark and timestamps. And newer HW hints that netstack
> doesn't support yet, e.g. I know mlx5 can assign unique (64-bit)
> flow-marks.
>
> I should also mention that I also want the solution to work for (struct)
> xdp_frame packets that gets redirected from RX to TX. And work when/if
> an xdp_frame gets converted to an SKB (happens for veth and cpumap)
> then the RX-hash, VLAN, checksum, mark, timestamp should be transferred
> to the SKB.
Hi, just to let you know,
We at Intel are currently working on flexible XDP hints that include
both generic (i.e. that every single driver/HW has) and custom
hints/metadata and are planning to publish a first RFC soon.
Feel free to join if you wish, we could cooperate and work together.
> > 2) RX T/S is stored in the meta-data of the RX frame.
>
> Yes, I also want to store the RX-timestamp the meta-data area. This
> means that RX-timestamp is stored memory-wise just before the packet
> header starts.
>
> For AF_XDP how does the userspace program know that info is stored in
> this area(?). As you know, it might only be some packets that contain
> the timestamp, e.g. for some NIC is it only the PTP packets.
>
> I've discussed this with OVS VMware people before (they requested
> RX-hash), and in that discussion Bjørn came up with the idea, that the
> "-32 bit" could contain the BTF-id number. Meaning the last u32 member
> of the metadata is btf_id (example below).
>
> struct my_metadata {
> u64 rx_timestamp;
> u32 rx_hash32;
> u32 btf_id;
> };
>
> When having the btf_id then the memory layout basically becomes self
> describing. I guess, we still need a single bit in the AF_XDP
> RX-descriptor telling us that meta-data area is populated, or perhaps
> we should store the btf_id in the AF_XDP RX-descriptor?
>
> Same goes for xdp_frame, should it store btf_id or have a single bit
> that says, btf_id is located in data_meta area.
>
> > 3) TX T/S is simply trace_printk out as there is missing XDP TX hook
> > like you pointed out.
>
> Again I want to use BTF to describe that a driver supports of
> TX-timestamp features. Like Saeed did for RX, the driver should export
> (a number) of BTF-id's that it support.
>
> E.g when the LaunchTime features is configured;
>
> struct my_metadata_tx {
> u64 LaunchTime_ktime;
> u32 btf_id;
> };
>
> When AF_XDP (or xdp_frame) want to transmit a frame as a specific time,
> e.g. via LaunchTime feature in i210 (igb) and i225 (igc).
>
> I've read up on i210 and i225 capabilities, and I think this will help
> us guide our design choices. We need to support different BTF-desc per
> TX queue basis, because the LaunchTime is configured per TX queue, and
> further more, i210 only support this on queue 0 and 1.
>
> Currently the LaunchTime config happens via TC config when attaching a
> ETF qdisc and doing TC-offloading. For now, I'm not suggesting
> changing that. Instead we can simply export/expose that the driver now
> support LaunchTime BTF-desc, when the config gets enabled.
>
>
> > So, if there is some ready work that we can evaluate, it will have us
> > greatly in extending it to stmmac driver.
>
> Saeed have done a number of different implementation attempts on RX
> side with BTF. We might be able to leverage some of that work. That
> said, the kernels BTF API have become more advanced since Saeed worked
> on this. Thus, I expect that we might be able to leverage some of this
> to simplify the approach.
>
>
> > >I have a practical project with a wind-turbine producer Vestas (they
> > >have even approved we can say this publicly on mailing lists). Thus, I
> > >can actually dedicate some time for this.
> > >
> > >You also have a practical project that needs this? (And I/we can keep it
> > >off the mailing lists if you prefer/need-to).
> >
> > Yes, we are about to start a a 3-way joint-development project that is
> > evaluating the suitability of using preempt-RT + XDP ZC + TSN for
> > integrating high level Industrial Ethernet stack on-top of Linux mainline
> > interface. So, there is couple of area that we will be looking into and
> > above two capabilities are foundational in adding "time-aware" to
> > XDP ZC interface. But, our current focus on getting the Linux mainline
> > capability ready, so we can discuss in ML.
>
> It sounds like our projects align very well! :-)))
> My customer also want the combination preempt-RT + XDP ZC + TSN.
>
> > >My plans: I will try to understand the hardware and drivers better, and
> > >then I will work on a design proposal that I will share with you for
> > >review.
> > >
> > >What are your plans?
> >
> > Siang and myself are looking into this area starting next week and
> > hopefully our time is aligned and we are hopeful to get this
> > capability available in stmmac for next RC cycles. Is the time-line
> > aligned to yours?
>
> Yes, this aligns with my time-line. I want to start prototyping some
> things next week, so I can start to run experiments with TSN. The
> TSN capable hardware for our PoC is being shipped to my house and
> should arrive next week.
>
> Looking forward to collaborate with all of you. You can let me know
> (offlist) if you prefer not getting Cc'ed on these mails. Some of you
> are bcc'ed and you have to opt-in if you are interested in collaborating.
> --
> Best regards,
> Jesper Dangaard Brouer
> MSc.CS, Principal Kernel Engineer at Red Hat
> LinkedIn: http://www.linkedin.com/in/brouer
Thanks,
Al
Powered by blists - more mailing lists