netdev - Re: [RFC PATCH net-next 2/2] net: Reset forwarded skb->tstamp before delivering to user space

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20211208024554.ol3y2dieuzcnevyf@kafai-mbp.dhcp.thefacebook.com>
Date:   Tue, 7 Dec 2021 18:45:54 -0800
From:   Martin KaFai Lau <kafai@...com>
To:     Willem de Bruijn <willemdebruijn.kernel@...il.com>,
        Daniel Borkmann <daniel@...earbox.net>
CC:     <netdev@...r.kernel.org>, Alexei Starovoitov <ast@...nel.org>,
        David Miller <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>, <kernel-team@...com>
Subject: Re: [RFC PATCH net-next 2/2] net: Reset forwarded skb->tstamp before
 delivering to user space

On Tue, Dec 07, 2021 at 07:44:05PM -0500, Willem de Bruijn wrote:
> > > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > > > index f091c7807a9e..181ddc989ead 100644
> > > > --- a/net/core/skbuff.c
> > > > +++ b/net/core/skbuff.c
> > > > @@ -5295,8 +5295,12 @@ void skb_scrub_tstamp(struct sk_buff *skb)
> > > >  {
> > > >         struct sock *sk = skb->sk;
> > > >
> > > > -       if (sk && sk_fullsock(sk) && sock_flag(sk, SOCK_TXTIME))
> > > > +       if (sk && sk_fullsock(sk) && sock_flag(sk, SOCK_TXTIME)) {
> > >
> > > There is a slight race here with the socket flipping the feature on/off.
> > Right, I think it is an inherited race by relating skb->tstamp with
> > a bit in sk, like the existing sch_etf.c.
> > Directly setting a bit in skb when setting the skb->tstamp will help.
> >
> > >
> > > >
> > > >                 skb->tstamp = 0;
> > > > +               skb->fwd_tstamp = 0;
> > > > +       } else if (skb->tstamp) {
> > > > +               skb->fwd_tstamp = 1;
> > > > +       }
> > >
> > > SO_TXTIME future delivery times are scrubbed, but TCP future delivery
> > > times are not?
> > It is not too much about scrubbing future SO_TXTIME or future TCP
> > delivery time for the local delivery.
> 
> The purpose of the above is to reset future delivery time whenever it
> can be mistaken for a timestamp, right?
> 
> This function is called on forwarding, redirection, looping from
> egress to ingress with __dev_forward_skb, etc. But then it breaks the
> delivery time forwarding over veth that I thought was the purpose of
> this patch series. I guess I'm a bit hazy when this is supposed to be
> scrubbed exactly.
> 
> > fwd_mono_tstamp may be a better name.  It is about the forwarded tstamp
> > is in mono.
> 
> After your change skb->tstamp is no longer in CLOCK_REALTIME, right?
Right.  The __net_timestamp() will use CLOCK_MONOTONIC.

> Somewhat annoyingly, that does not imply that it is always
> CLOCK_MONOTONIC. Because while FQ uses that, ETF is programmed with
> CLOCK_TAI.
Yes, it is the annoying part, so this patch keeps the tstamp
scrubbing for SO_TXTIME.

If a sk in veth@...ns uses SO_TXTIME setting tstamp to TAI and
it is not scrubbed here, it may get forwarded to the fq@...tns
and then get dropped.

skb_ktime_get() also won't know how to compare with the current
time (mono or tai?) and then reset if needed.
Alternatively, it can always re-stamp (__net_timestamp()) much earlier
in the stack before recvmsg().  e.g. just after the sch_handle_ingress()
when TC_ACT_OK is returned as Daniel also mentioned in another thread.
That will be more limited to the bpf@...ress (and then bpf_redirect) usecase
instead of generally applicable to the ip[6]_forward.  However,
the benefit is a more limited impact scope and potential breakage.

> Perhaps skb->delivery_time is the most specific description. And that
> is easy to test for in skb_scrub_tstamp.
> 
> 
> > e.g. the packet from a container-netns can be queued
> > at the fq@...tns (the case described in patch 1 commit log).
> > Also, the bpf@...ress@...h@...tns can now expect the skb->tstamp is in
> > mono time.  BPF side does not have helper returning real clock, so it is
> > safe to assume that bpf prog is comparing (or setting) skb->tstamp as
> > mono also.
> >
> > > If adding a bit, might it be simpler to add a bit tstamp_is_edt, and
> > > scrub based on that. That is also not open to the above race.
> > It was one of my earlier attempts by adding tstamp_is_tx_mono and
> > set it in tcp_output.c and then test it before scrubbing.
> > Other than changing the tcp_output.c (e.g. in __tcp_transmit_skb),
> > I ended up making another change on the bpf side to also set
> > this bit when the bpf_prog is updating the __sk_buff->tstamp.  Thus,
> > in this patch , I ended up setting a bit only in the forward path.
> >
> > I can go back to retry the tstamp_is_edt/tstamp_is_tx_mono idea and
> > that can also avoid the race in testing sock_flag(sk, SOCK_TXTIME)
> > as you suggested.
> 
> Sounds great, thanks