lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20211217073351.k4thkro443m3fnme@kafai-mbp.dhcp.thefacebook.com>
Date:   Thu, 16 Dec 2021 23:33:51 -0800
From:   Martin KaFai Lau <kafai@...com>
To:     Daniel Borkmann <daniel@...earbox.net>
CC:     Willem de Bruijn <willemdebruijn.kernel@...il.com>,
        <netdev@...r.kernel.org>, Alexei Starovoitov <ast@...nel.org>,
        David Miller <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>, <kernel-team@...com>
Subject: Re: [RFC PATCH v2 net-next] net: Preserve skb delivery time during
 forward

On Fri, Dec 17, 2021 at 12:42:30AM +0100, Daniel Borkmann wrote:
> On 12/16/21 11:58 PM, Willem de Bruijn wrote:
> > > > > @@ -530,7 +538,14 @@ struct skb_shared_info {
> > > > >          /* Warning: this field is not always filled in (UFO)! */
> > > > >          unsigned short  gso_segs;
> > > > >          struct sk_buff  *frag_list;
> > > > > -       struct skb_shared_hwtstamps hwtstamps;
> > > > > +       union {
> > > > > +               /* If SKBTX_DELIVERY_TSTAMP is set in tx_flags,
> > > > > +                * tx_delivery_tstamp is stored instead of
> > > > > +                * hwtstamps.
> > > > > +                */
> > > > 
> > > > Should we just encode the timebase and/or type { timestamp,
> > > > delivery_time } in th lower bits of the timestamp field? Its
> > > > resolution is higher than actual clock precision.
> > > In skb->tstamp ?
> > 
> > Yes. Arguably a hack, but those bits are in the noise now, and it
> > avoids the clone issue with skb_shinfo (and scarcity of flag bits
> > there).
> > 
> > > > is non-zero skb->tstamp test not sufficient, instead of
> > > > SKBTX_DELIVERY_TSTAMP_ALLOW_FWD.
> > > > 
> > > > It is if only called on the egress path. Is bpf on ingress the only
> > > > reason for this?
> > > Ah. ic.  meaning testing non-zero skb->tstamp and then call
> > > skb_save_delivery_time() only during the veth-egress-path:
> > > somewhere in veth_xmit() => veth_forward_skb() but before
> > > skb->tstamp was reset to 0 in __dev_forward_skb().
> > 
> > Right. If delivery_time is the only use of skb->tstamp on egress, and
> > timestamp is the only use on ingress, then the only time the
> > delivery_time needs to be cached if when looping from egress to
> > ingress and this field is non-zero.
> > 
> > > Keep *_forward() and bpf_out_*() unchanged (i.e. keep skb->tstamp = 0)
> > > because the skb->tstamp could be stamped by net_timestamp_check().
> > > 
> > > Then SKBTX_DELIVERY_TSTAMP_ALLOW_FWD is not needed.
> > > 
> > > Did I understand your suggestion correctly?
> > 
> > I think so.
> > 
> > But the reality is complicated if something may be setting a delivery
> > time on ingress (a BPF filter?)
> 
> I'm not quite following the 'bpf_out_*() unchanged (i.e. keep skb->tstamp = 0)'
> part yet; in our case we would need to preserve it as well, for example, we are
> redirecting via bpf from bpf@...ingress@...t-veth to bpf@...egress@...s-dev in
> the egress path and fq sits on phys-dev.. (I mean if needed we could easily do
> that as shown in my prev diff with a flag for the helper).
Right, we have the same use case:
    redirecting from bpf@...ingress@...t-veth to bpf@...egress@...s-dev in
    the egress path and fq sits on phys-dev

My earlier comment was on having the delivery_time preserved in
the skb_shared_hwtstamps.  The delivery_time (e.g. EDT) and
timestamp (timestamp as RX timestamp) are separately stored when
looping from veth-egress to veth-ingress:

	delivery_time in skb_shared_hwtstamps
	rx timestamp in skb->tstamp

Thus, when bpf_redirect_neigh(phys-dev) happens, bpf_out_*() can
continue to reset skb->tstamp as-is while delivery_time will
automatically be kept in skb_shared_hwtstamps.  When the skb
reaches the egress@...s-dev (__dev_queue_xmit), the delivery_time
in skb_shared_hwtstamps will be restored into skb->tstamp (done
in skb_restore_delivery_time in this patch).

> > > However, we still need a bit to distinguish tx_delivery_tstamp
> > > from hwtstamps.
> > > 
> > > > 
> > > > > +{
> > > > > +       if (skb_shinfo(skb)->tx_flags & SKBTX_DELIVERY_TSTAMP_ALLOW_FWD) {
> > > > > +               skb_shinfo(skb)->tx_delivery_tstamp = skb->tstamp;
> > > > > +               skb_shinfo(skb)->tx_flags |= SKBTX_DELIVERY_TSTAMP;
> > > > > +               skb_shinfo(skb)->tx_flags &= ~SKBTX_DELIVERY_TSTAMP_ALLOW_FWD;
> > > > > +       }
> > > > 
> > > > Is this only called when there are no clones/shares?
> > > No, I don't think so.  TCP clone it.  I also started thinking about
> > > this after noticing a mistake in the change in  __tcp_transmit_skb().
> > > 
> > > There are other places that change tx_flags, e.g. tcp_offload.c.
> > > It is not shared at those places or there is some specific points
> > > in the stack that is safe to change ?
> > 
> > The packet probably is not yet shared. Until the TCP stack gives a
> > packet to the IP layer, it can treat it as exclusive.
> > 
> > Though it does seem that these fields are accessed in a possibly racy
> > manner. Drivers with hardware tx timestamp offload may set
> > skb_shinfo(orig_skb)->tx_flags & SKBTX_IN_PROGRESS without checking
> > whether the skb may be cloned.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ