netdev - Re: [RFC PATCH net-next 2/2] net: Reset forwarded skb->tstamp before delivering to user space

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 7 Dec 2021 16:07:57 -0800
From:   Martin KaFai Lau <kafai@...com>
To:     Willem de Bruijn <willemdebruijn.kernel@...il.com>
CC:     <netdev@...r.kernel.org>, Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        David Miller <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>, <kernel-team@...com>
Subject: Re: [RFC PATCH net-next 2/2] net: Reset forwarded skb->tstamp before
 delivering to user space

On Tue, Dec 07, 2021 at 09:27:55AM -0500, Willem de Bruijn wrote:
> On Mon, Dec 6, 2021 at 9:01 PM Martin KaFai Lau <kafai@...com> wrote:
> >
> > The skb->tstamp may be set by a local sk (as a sender in tcp) which then
> > forwarded and delivered to another sk (as a receiver).
> >
> > An example:
> >     sender-sk => veth@...ns =====> veth@...t => receiver-sk
> >                              ^^^
> >                         __dev_forward_skb
> >
> > The skb->tstamp is marked with a future TX time.  This future
> > skb->tstamp will confuse the receiver-sk.
> >
> > This patch marks the skb if the skb->tstamp is forwarded.
> > Before using the skb->tstamp as a rx timestamp, it needs
> > to be re-stamped to avoid getting a future time.  It is
> > done in the RX timestamp reading helper skb_get_ktime().
> >
> > Signed-off-by: Martin KaFai Lau <kafai@...com>
> > ---
> >  include/linux/skbuff.h | 14 +++++++++-----
> >  net/core/dev.c         |  4 +++-
> >  net/core/skbuff.c      |  6 +++++-
> >  3 files changed, 17 insertions(+), 7 deletions(-)
> >
> > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > index b609bdc5398b..bc4ae34c4e22 100644
> > --- a/include/linux/skbuff.h
> > +++ b/include/linux/skbuff.h
> > @@ -867,6 +867,7 @@ struct sk_buff {
> >         __u8                    decrypted:1;
> >  #endif
> >         __u8                    slow_gro:1;
> > +       __u8                    fwd_tstamp:1;
> >
> >  #ifdef CONFIG_NET_SCHED
> >         __u16                   tc_index;       /* traffic control index */
> > @@ -3806,9 +3807,12 @@ static inline void skb_copy_to_linear_data_offset(struct sk_buff *skb,
> >  }
> >
> >  void skb_init(void);
> > +void net_timestamp_set(struct sk_buff *skb);
> >
> > -static inline ktime_t skb_get_ktime(const struct sk_buff *skb)
> > +static inline ktime_t skb_get_ktime(struct sk_buff *skb)
> >  {
> > +       if (unlikely(skb->fwd_tstamp))
> > +               net_timestamp_set(skb);
> >         return ktime_mono_to_real_cond(skb->tstamp);
> 
> This changes timestamp behavior for existing applications, probably
> worth mentioning in the commit message if nothing else. A timestamp
> taking at the time of the recv syscall is not very useful.
> 
> If a forwarded timestamp is not a future delivery time (as those are
> scrubbed), is it not correct to just deliver the original timestamp?
> It probably was taken at some earlier __netif_receive_skb_core.
Make sense.  I will compare with the current mono clock first before
resetting and also mention this behavior change in the commit message.

Do you think it will be too heavy to always compare with
the current time without testing the skb->fwd_tstamp bit
first?

> 
> >  }
> >
> > -static inline void net_timestamp_set(struct sk_buff *skb)
> > +void net_timestamp_set(struct sk_buff *skb)
> >  {
> >         skb->tstamp = 0;
> > +       skb->fwd_tstamp = 0;
> >         if (static_branch_unlikely(&netstamp_needed_key))
> >                 __net_timestamp(skb);
> >  }
> > +EXPORT_SYMBOL(net_timestamp_set);
> >
> >  #define net_timestamp_check(COND, SKB)                         \
> >         if (static_branch_unlikely(&netstamp_needed_key)) {     \
> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > index f091c7807a9e..181ddc989ead 100644
> > --- a/net/core/skbuff.c
> > +++ b/net/core/skbuff.c
> > @@ -5295,8 +5295,12 @@ void skb_scrub_tstamp(struct sk_buff *skb)
> >  {
> >         struct sock *sk = skb->sk;
> >
> > -       if (sk && sk_fullsock(sk) && sock_flag(sk, SOCK_TXTIME))
> > +       if (sk && sk_fullsock(sk) && sock_flag(sk, SOCK_TXTIME)) {
> 
> There is a slight race here with the socket flipping the feature on/off.
Right, I think it is an inherited race by relating skb->tstamp with
a bit in sk, like the existing sch_etf.c.
Directly setting a bit in skb when setting the skb->tstamp will help.

> 
> >
> >                 skb->tstamp = 0;
> > +               skb->fwd_tstamp = 0;
> > +       } else if (skb->tstamp) {
> > +               skb->fwd_tstamp = 1;
> > +       }
> 
> SO_TXTIME future delivery times are scrubbed, but TCP future delivery
> times are not?
It is not too much about scrubbing future SO_TXTIME or future TCP
delivery time for the local delivery.

fwd_mono_tstamp may be a better name.  It is about the forwarded tstamp
is in mono.  e.g. the packet from a container-netns can be queued
at the fq@...tns (the case described in patch 1 commit log).
Also, the bpf@...ress@...h@...tns can now expect the skb->tstamp is in
mono time.  BPF side does not have helper returning real clock, so it is
safe to assume that bpf prog is comparing (or setting) skb->tstamp as
mono also.

> If adding a bit, might it be simpler to add a bit tstamp_is_edt, and
> scrub based on that. That is also not open to the above race.
It was one of my earlier attempts by adding tstamp_is_tx_mono and
set it in tcp_output.c and then test it before scrubbing.
Other than changing the tcp_output.c (e.g. in __tcp_transmit_skb),
I ended up making another change on the bpf side to also set
this bit when the bpf_prog is updating the __sk_buff->tstamp.  Thus,
in this patch , I ended up setting a bit only in the forward path.

I can go back to retry the tstamp_is_edt/tstamp_is_tx_mono idea and
that can also avoid the race in testing sock_flag(sk, SOCK_TXTIME)
as you suggested.

Thanks for the review !