[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20211217175705.fpbsjoqeuaul3ydp@kafai-mbp.dhcp.thefacebook.com>
Date: Fri, 17 Dec 2021 09:57:05 -0800
From: Martin KaFai Lau <kafai@...com>
To: Daniel Borkmann <daniel@...earbox.net>
CC: Willem de Bruijn <willemdebruijn.kernel@...il.com>,
<netdev@...r.kernel.org>, Alexei Starovoitov <ast@...nel.org>,
David Miller <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, <kernel-team@...com>
Subject: Re: [RFC PATCH v2 net-next] net: Preserve skb delivery time during
forward
On Fri, Dec 17, 2021 at 12:13:14PM +0100, Daniel Borkmann wrote:
> On 12/17/21 8:33 AM, Martin KaFai Lau wrote:
> > On Fri, Dec 17, 2021 at 12:42:30AM +0100, Daniel Borkmann wrote:
> > > On 12/16/21 11:58 PM, Willem de Bruijn wrote:
> > > > > > > @@ -530,7 +538,14 @@ struct skb_shared_info {
> > > > > > > /* Warning: this field is not always filled in (UFO)! */
> > > > > > > unsigned short gso_segs;
> > > > > > > struct sk_buff *frag_list;
> > > > > > > - struct skb_shared_hwtstamps hwtstamps;
> > > > > > > + union {
> > > > > > > + /* If SKBTX_DELIVERY_TSTAMP is set in tx_flags,
> > > > > > > + * tx_delivery_tstamp is stored instead of
> > > > > > > + * hwtstamps.
> > > > > > > + */
> > > > > >
> > > > > > Should we just encode the timebase and/or type { timestamp,
> > > > > > delivery_time } in th lower bits of the timestamp field? Its
> > > > > > resolution is higher than actual clock precision.
> > > > > In skb->tstamp ?
> > > >
> > > > Yes. Arguably a hack, but those bits are in the noise now, and it
> > > > avoids the clone issue with skb_shinfo (and scarcity of flag bits
> > > > there).
> > > >
> > > > > > is non-zero skb->tstamp test not sufficient, instead of
> > > > > > SKBTX_DELIVERY_TSTAMP_ALLOW_FWD.
> > > > > >
> > > > > > It is if only called on the egress path. Is bpf on ingress the only
> > > > > > reason for this?
> > > > > Ah. ic. meaning testing non-zero skb->tstamp and then call
> > > > > skb_save_delivery_time() only during the veth-egress-path:
> > > > > somewhere in veth_xmit() => veth_forward_skb() but before
> > > > > skb->tstamp was reset to 0 in __dev_forward_skb().
> > > >
> > > > Right. If delivery_time is the only use of skb->tstamp on egress, and
> > > > timestamp is the only use on ingress, then the only time the
> > > > delivery_time needs to be cached if when looping from egress to
> > > > ingress and this field is non-zero.
> > > >
> > > > > Keep *_forward() and bpf_out_*() unchanged (i.e. keep skb->tstamp = 0)
> > > > > because the skb->tstamp could be stamped by net_timestamp_check().
> > > > >
> > > > > Then SKBTX_DELIVERY_TSTAMP_ALLOW_FWD is not needed.
> > > > >
> > > > > Did I understand your suggestion correctly?
> > > >
> > > > I think so.
> > > >
> > > > But the reality is complicated if something may be setting a delivery
> > > > time on ingress (a BPF filter?)
> > >
> > > I'm not quite following the 'bpf_out_*() unchanged (i.e. keep skb->tstamp = 0)'
> > > part yet; in our case we would need to preserve it as well, for example, we are
> > > redirecting via bpf from bpf@...ingress@...t-veth to bpf@...egress@...s-dev in
> > > the egress path and fq sits on phys-dev.. (I mean if needed we could easily do
> > > that as shown in my prev diff with a flag for the helper).
> > Right, we have the same use case:
> > redirecting from bpf@...ingress@...t-veth to bpf@...egress@...s-dev in
> > the egress path and fq sits on phys-dev
> >
> > My earlier comment was on having the delivery_time preserved in
> > the skb_shared_hwtstamps. The delivery_time (e.g. EDT) and
> > timestamp (timestamp as RX timestamp) are separately stored when
> > looping from veth-egress to veth-ingress:
> >
> > delivery_time in skb_shared_hwtstamps
> > rx timestamp in skb->tstamp
> >
> > Thus, when bpf_redirect_neigh(phys-dev) happens, bpf_out_*() can
> > continue to reset skb->tstamp as-is while delivery_time will
> > automatically be kept in skb_shared_hwtstamps. When the skb
> > reaches the egress@...s-dev (__dev_queue_xmit), the delivery_time
> > in skb_shared_hwtstamps will be restored into skb->tstamp (done
> > in skb_restore_delivery_time in this patch).
>
> I think that could probably work, also in particular if it's restored once
> on stacked devs e.g. when instead of phys-dev we're dealing with upper
> tunnel dev (e.g. vxlan/geneve + BPF with collect_md). Wouldn't we still
> need something like SKBTX_DELIVERY_TSTAMP_ALLOW_FWD, e.g. when the phys
> driver sets skb_hwtstamps(skb)->hwtstamp on RX, and this gets carried on
> the ingress path to the target namespaces' socket?
In the opposite direction phys-dev => host-veth => netns-veth,
the phys driver can set the hwtstamp but it won't set the
SKBTX_DELIVERY_TSTAMP. This hwstamp gets carried onto the ingress
path of the target namesapce which I think is the current behavior
also ?
>From phys-dev => host-veth, the skb is forwarded and its
skb->tstamp reset to 0. veth_xmit() won't save zero
skb->tstamp into hwstamp and SKBTX_DELIVERY_TSTAMP won't
be set also.
Powered by blists - more mailing lists