netdev - Re: [RFC PATCH net-next 2/2] net: Reset forwarded skb->tstamp before delivering to user space

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+FuTSfBTQ+G3i6j8LPi7PHZWnSx5msdMYoUURdp5Z2d3S6gDA@mail.gmail.com>
Date:   Tue, 7 Dec 2021 19:44:05 -0500
From:   Willem de Bruijn <willemdebruijn.kernel@...il.com>
To:     Martin KaFai Lau <kafai@...com>
Cc:     Willem de Bruijn <willemdebruijn.kernel@...il.com>,
        netdev@...r.kernel.org, Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        David Miller <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>, kernel-team@...com
Subject: Re: [RFC PATCH net-next 2/2] net: Reset forwarded skb->tstamp before
 delivering to user space

> > > -static inline ktime_t skb_get_ktime(const struct sk_buff *skb)
> > > +static inline ktime_t skb_get_ktime(struct sk_buff *skb)
> > >  {
> > > +       if (unlikely(skb->fwd_tstamp))
> > > +               net_timestamp_set(skb);
> > >         return ktime_mono_to_real_cond(skb->tstamp);
> >
> > This changes timestamp behavior for existing applications, probably
> > worth mentioning in the commit message if nothing else. A timestamp
> > taking at the time of the recv syscall is not very useful.
> >
> > If a forwarded timestamp is not a future delivery time (as those are
> > scrubbed), is it not correct to just deliver the original timestamp?
> > It probably was taken at some earlier __netif_receive_skb_core.
> Make sense.  I will compare with the current mono clock first before
> resetting and also mention this behavior change in the commit message.
>
> Do you think it will be too heavy to always compare with
> the current time without testing the skb->fwd_tstamp bit
> first?

There are other examples of code using ktime_get and variants in the
hot path, such as FQ.

Especially if skb_get_ktime is called in recv() timestamp helpers, it
is perhaps acceptable. If not ideal. If we need an skb bit anyway,
then this is moot.

> >
> > >  }
> > >
> > > -static inline void net_timestamp_set(struct sk_buff *skb)
> > > +void net_timestamp_set(struct sk_buff *skb)
> > >  {
> > >         skb->tstamp = 0;
> > > +       skb->fwd_tstamp = 0;
> > >         if (static_branch_unlikely(&netstamp_needed_key))
> > >                 __net_timestamp(skb);
> > >  }
> > > +EXPORT_SYMBOL(net_timestamp_set);
> > >
> > >  #define net_timestamp_check(COND, SKB)                         \
> > >         if (static_branch_unlikely(&netstamp_needed_key)) {     \
> > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > > index f091c7807a9e..181ddc989ead 100644
> > > --- a/net/core/skbuff.c
> > > +++ b/net/core/skbuff.c
> > > @@ -5295,8 +5295,12 @@ void skb_scrub_tstamp(struct sk_buff *skb)
> > >  {
> > >         struct sock *sk = skb->sk;
> > >
> > > -       if (sk && sk_fullsock(sk) && sock_flag(sk, SOCK_TXTIME))
> > > +       if (sk && sk_fullsock(sk) && sock_flag(sk, SOCK_TXTIME)) {
> >
> > There is a slight race here with the socket flipping the feature on/off.
> Right, I think it is an inherited race by relating skb->tstamp with
> a bit in sk, like the existing sch_etf.c.
> Directly setting a bit in skb when setting the skb->tstamp will help.
>
> >
> > >
> > >                 skb->tstamp = 0;
> > > +               skb->fwd_tstamp = 0;
> > > +       } else if (skb->tstamp) {
> > > +               skb->fwd_tstamp = 1;
> > > +       }
> >
> > SO_TXTIME future delivery times are scrubbed, but TCP future delivery
> > times are not?
> It is not too much about scrubbing future SO_TXTIME or future TCP
> delivery time for the local delivery.

The purpose of the above is to reset future delivery time whenever it
can be mistaken for a timestamp, right?

This function is called on forwarding, redirection, looping from
egress to ingress with __dev_forward_skb, etc. But then it breaks the
delivery time forwarding over veth that I thought was the purpose of
this patch series. I guess I'm a bit hazy when this is supposed to be
scrubbed exactly.

> fwd_mono_tstamp may be a better name.  It is about the forwarded tstamp
> is in mono.

After your change skb->tstamp is no longer in CLOCK_REALTIME, right?

Somewhat annoyingly, that does not imply that it is always
CLOCK_MONOTONIC. Because while FQ uses that, ETF is programmed with
CLOCK_TAI.

Perhaps skb->delivery_time is the most specific description. And that
is easy to test for in skb_scrub_tstamp.


> e.g. the packet from a container-netns can be queued
> at the fq@...tns (the case described in patch 1 commit log).
> Also, the bpf@...ress@...h@...tns can now expect the skb->tstamp is in
> mono time.  BPF side does not have helper returning real clock, so it is
> safe to assume that bpf prog is comparing (or setting) skb->tstamp as
> mono also.
>
> > If adding a bit, might it be simpler to add a bit tstamp_is_edt, and
> > scrub based on that. That is also not open to the above race.
> It was one of my earlier attempts by adding tstamp_is_tx_mono and
> set it in tcp_output.c and then test it before scrubbing.
> Other than changing the tcp_output.c (e.g. in __tcp_transmit_skb),
> I ended up making another change on the bpf side to also set
> this bit when the bpf_prog is updating the __sk_buff->tstamp.  Thus,
> in this patch , I ended up setting a bit only in the forward path.
>
> I can go back to retry the tstamp_is_edt/tstamp_is_tx_mono idea and
> that can also avoid the race in testing sock_flag(sk, SOCK_TXTIME)
> as you suggested.

Sounds great, thanks