netdev - Re: [RFC PATCH net-next 2/2] net: Reset forwarded skb->tstamp before delivering to user space

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20211208083013.zqeipdfprcdr3ntn@kafai-mbp.dhcp.thefacebook.com>
Date:   Wed, 8 Dec 2021 00:30:13 -0800
From:   Martin KaFai Lau <kafai@...com>
To:     Daniel Borkmann <daniel@...earbox.net>
CC:     Willem de Bruijn <willemdebruijn.kernel@...il.com>,
        <netdev@...r.kernel.org>, Alexei Starovoitov <ast@...nel.org>,
        David Miller <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>, <kernel-team@...com>
Subject: Re: [RFC PATCH net-next 2/2] net: Reset forwarded skb->tstamp before
 delivering to user space

On Wed, Dec 08, 2021 at 12:18:46AM -0800, Martin KaFai Lau wrote:
> On Tue, Dec 07, 2021 at 10:48:53PM +0100, Daniel Borkmann wrote:
> > On 12/7/21 3:27 PM, Willem de Bruijn wrote:
> > > On Mon, Dec 6, 2021 at 9:01 PM Martin KaFai Lau <kafai@...com> wrote:
> > > > 
> > > > The skb->tstamp may be set by a local sk (as a sender in tcp) which then
> > > > forwarded and delivered to another sk (as a receiver).
> > > > 
> > > > An example:
> > > >      sender-sk => veth@...ns =====> veth@...t => receiver-sk
> > > >                               ^^^
> > > >                          __dev_forward_skb
> > > > 
> > > > The skb->tstamp is marked with a future TX time.  This future
> > > > skb->tstamp will confuse the receiver-sk.
> > > > 
> > > > This patch marks the skb if the skb->tstamp is forwarded.
> > > > Before using the skb->tstamp as a rx timestamp, it needs
> > > > to be re-stamped to avoid getting a future time.  It is
> > > > done in the RX timestamp reading helper skb_get_ktime().
> > > > 
> > > > Signed-off-by: Martin KaFai Lau <kafai@...com>
> > > > ---
> > > >   include/linux/skbuff.h | 14 +++++++++-----
> > > >   net/core/dev.c         |  4 +++-
> > > >   net/core/skbuff.c      |  6 +++++-
> > > >   3 files changed, 17 insertions(+), 7 deletions(-)
> > > > 
> > > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > > > index b609bdc5398b..bc4ae34c4e22 100644
> > > > --- a/include/linux/skbuff.h
> > > > +++ b/include/linux/skbuff.h
> > > > @@ -867,6 +867,7 @@ struct sk_buff {
> > > >          __u8                    decrypted:1;
> > > >   #endif
> > > >          __u8                    slow_gro:1;
> > > > +       __u8                    fwd_tstamp:1;
> > > > 
> > > >   #ifdef CONFIG_NET_SCHED
> > > >          __u16                   tc_index;       /* traffic control index */
> > > > @@ -3806,9 +3807,12 @@ static inline void skb_copy_to_linear_data_offset(struct sk_buff *skb,
> > > >   }
> > > > 
> > > >   void skb_init(void);
> > > > +void net_timestamp_set(struct sk_buff *skb);
> > > > 
> > > > -static inline ktime_t skb_get_ktime(const struct sk_buff *skb)
> > > > +static inline ktime_t skb_get_ktime(struct sk_buff *skb)
> > > >   {
> > > > +       if (unlikely(skb->fwd_tstamp))
> > > > +               net_timestamp_set(skb);
> > > >          return ktime_mono_to_real_cond(skb->tstamp);
> > > 
> > > This changes timestamp behavior for existing applications, probably
> > > worth mentioning in the commit message if nothing else. A timestamp
> > > taking at the time of the recv syscall is not very useful.
> > > 
> > > If a forwarded timestamp is not a future delivery time (as those are
> > > scrubbed), is it not correct to just deliver the original timestamp?
> > > It probably was taken at some earlier __netif_receive_skb_core.
> > > 
> > > >   }
> > > > 
> > > > -static inline void net_timestamp_set(struct sk_buff *skb)
> > > > +void net_timestamp_set(struct sk_buff *skb)
> > > >   {
> > > >          skb->tstamp = 0;
> > > > +       skb->fwd_tstamp = 0;
> > > >          if (static_branch_unlikely(&netstamp_needed_key))
> > > >                  __net_timestamp(skb);
> > > >   }
> > > > +EXPORT_SYMBOL(net_timestamp_set);
> > > > 
> > > >   #define net_timestamp_check(COND, SKB)                         \
> > > >          if (static_branch_unlikely(&netstamp_needed_key)) {     \
> > > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > > > index f091c7807a9e..181ddc989ead 100644
> > > > --- a/net/core/skbuff.c
> > > > +++ b/net/core/skbuff.c
> > > > @@ -5295,8 +5295,12 @@ void skb_scrub_tstamp(struct sk_buff *skb)
> > > >   {
> > > >          struct sock *sk = skb->sk;
> > > > 
> > > > -       if (sk && sk_fullsock(sk) && sock_flag(sk, SOCK_TXTIME))
> > > > +       if (sk && sk_fullsock(sk) && sock_flag(sk, SOCK_TXTIME)) {
> > > 
> > > There is a slight race here with the socket flipping the feature on/off.
> > > 
> > > >                  skb->tstamp = 0;
> > > > +               skb->fwd_tstamp = 0;
> > > > +       } else if (skb->tstamp) {
> > > > +               skb->fwd_tstamp = 1;
> > > > +       }
> > > 
> > > SO_TXTIME future delivery times are scrubbed, but TCP future delivery
> > > times are not?
> > > 
> > > If adding a bit, might it be simpler to add a bit tstamp_is_edt, and
> > > scrub based on that. That is also not open to the above race.
> > 
> > One other thing I wonder, BPF progs at host-facing veth's tc ingress which
> > are not aware of skb->tstamp will then see a tstamp from future given we
> > intentionally bypass the net_timestamp_check() and might get confused (or
> > would confuse higher-layer application logic)? Not quite sure yet if they
> > would be the only affected user.
> Considering the variety of clock used in skb->tstamp (real/mono, and also
> tai in SO_TXTIME),  in general I am not sure if the tc-bpf can assume anything
> in the skb->tstamp now.
> Also, there is only mono clock bpf_ktime_get helper, the most reasonable usage
> now for tc-bpf is to set the EDT which is in mono.  This seems to be the
> intention when the __sk_buff->tstamp was added.
> For ingress, it is real clock now.  Other than simply printing it out,
> it is hard to think of a good way to use the value.  Also, although
> it is unlikely, net_timestamp_check() does not always stamp the skb.
For non bpf ingress, hmmm.... yeah, not sure if it is indeed an issue :/
may be save the tx tstamp first and then temporarily restamp with __net_timestamp()