[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20211208083013.zqeipdfprcdr3ntn@kafai-mbp.dhcp.thefacebook.com>
Date: Wed, 8 Dec 2021 00:30:13 -0800
From: Martin KaFai Lau <kafai@...com>
To: Daniel Borkmann <daniel@...earbox.net>
CC: Willem de Bruijn <willemdebruijn.kernel@...il.com>,
<netdev@...r.kernel.org>, Alexei Starovoitov <ast@...nel.org>,
David Miller <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, <kernel-team@...com>
Subject: Re: [RFC PATCH net-next 2/2] net: Reset forwarded skb->tstamp before
delivering to user space
On Wed, Dec 08, 2021 at 12:18:46AM -0800, Martin KaFai Lau wrote:
> On Tue, Dec 07, 2021 at 10:48:53PM +0100, Daniel Borkmann wrote:
> > On 12/7/21 3:27 PM, Willem de Bruijn wrote:
> > > On Mon, Dec 6, 2021 at 9:01 PM Martin KaFai Lau <kafai@...com> wrote:
> > > >
> > > > The skb->tstamp may be set by a local sk (as a sender in tcp) which then
> > > > forwarded and delivered to another sk (as a receiver).
> > > >
> > > > An example:
> > > > sender-sk => veth@...ns =====> veth@...t => receiver-sk
> > > > ^^^
> > > > __dev_forward_skb
> > > >
> > > > The skb->tstamp is marked with a future TX time. This future
> > > > skb->tstamp will confuse the receiver-sk.
> > > >
> > > > This patch marks the skb if the skb->tstamp is forwarded.
> > > > Before using the skb->tstamp as a rx timestamp, it needs
> > > > to be re-stamped to avoid getting a future time. It is
> > > > done in the RX timestamp reading helper skb_get_ktime().
> > > >
> > > > Signed-off-by: Martin KaFai Lau <kafai@...com>
> > > > ---
> > > > include/linux/skbuff.h | 14 +++++++++-----
> > > > net/core/dev.c | 4 +++-
> > > > net/core/skbuff.c | 6 +++++-
> > > > 3 files changed, 17 insertions(+), 7 deletions(-)
> > > >
> > > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > > > index b609bdc5398b..bc4ae34c4e22 100644
> > > > --- a/include/linux/skbuff.h
> > > > +++ b/include/linux/skbuff.h
> > > > @@ -867,6 +867,7 @@ struct sk_buff {
> > > > __u8 decrypted:1;
> > > > #endif
> > > > __u8 slow_gro:1;
> > > > + __u8 fwd_tstamp:1;
> > > >
> > > > #ifdef CONFIG_NET_SCHED
> > > > __u16 tc_index; /* traffic control index */
> > > > @@ -3806,9 +3807,12 @@ static inline void skb_copy_to_linear_data_offset(struct sk_buff *skb,
> > > > }
> > > >
> > > > void skb_init(void);
> > > > +void net_timestamp_set(struct sk_buff *skb);
> > > >
> > > > -static inline ktime_t skb_get_ktime(const struct sk_buff *skb)
> > > > +static inline ktime_t skb_get_ktime(struct sk_buff *skb)
> > > > {
> > > > + if (unlikely(skb->fwd_tstamp))
> > > > + net_timestamp_set(skb);
> > > > return ktime_mono_to_real_cond(skb->tstamp);
> > >
> > > This changes timestamp behavior for existing applications, probably
> > > worth mentioning in the commit message if nothing else. A timestamp
> > > taking at the time of the recv syscall is not very useful.
> > >
> > > If a forwarded timestamp is not a future delivery time (as those are
> > > scrubbed), is it not correct to just deliver the original timestamp?
> > > It probably was taken at some earlier __netif_receive_skb_core.
> > >
> > > > }
> > > >
> > > > -static inline void net_timestamp_set(struct sk_buff *skb)
> > > > +void net_timestamp_set(struct sk_buff *skb)
> > > > {
> > > > skb->tstamp = 0;
> > > > + skb->fwd_tstamp = 0;
> > > > if (static_branch_unlikely(&netstamp_needed_key))
> > > > __net_timestamp(skb);
> > > > }
> > > > +EXPORT_SYMBOL(net_timestamp_set);
> > > >
> > > > #define net_timestamp_check(COND, SKB) \
> > > > if (static_branch_unlikely(&netstamp_needed_key)) { \
> > > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > > > index f091c7807a9e..181ddc989ead 100644
> > > > --- a/net/core/skbuff.c
> > > > +++ b/net/core/skbuff.c
> > > > @@ -5295,8 +5295,12 @@ void skb_scrub_tstamp(struct sk_buff *skb)
> > > > {
> > > > struct sock *sk = skb->sk;
> > > >
> > > > - if (sk && sk_fullsock(sk) && sock_flag(sk, SOCK_TXTIME))
> > > > + if (sk && sk_fullsock(sk) && sock_flag(sk, SOCK_TXTIME)) {
> > >
> > > There is a slight race here with the socket flipping the feature on/off.
> > >
> > > > skb->tstamp = 0;
> > > > + skb->fwd_tstamp = 0;
> > > > + } else if (skb->tstamp) {
> > > > + skb->fwd_tstamp = 1;
> > > > + }
> > >
> > > SO_TXTIME future delivery times are scrubbed, but TCP future delivery
> > > times are not?
> > >
> > > If adding a bit, might it be simpler to add a bit tstamp_is_edt, and
> > > scrub based on that. That is also not open to the above race.
> >
> > One other thing I wonder, BPF progs at host-facing veth's tc ingress which
> > are not aware of skb->tstamp will then see a tstamp from future given we
> > intentionally bypass the net_timestamp_check() and might get confused (or
> > would confuse higher-layer application logic)? Not quite sure yet if they
> > would be the only affected user.
> Considering the variety of clock used in skb->tstamp (real/mono, and also
> tai in SO_TXTIME), in general I am not sure if the tc-bpf can assume anything
> in the skb->tstamp now.
> Also, there is only mono clock bpf_ktime_get helper, the most reasonable usage
> now for tc-bpf is to set the EDT which is in mono. This seems to be the
> intention when the __sk_buff->tstamp was added.
> For ingress, it is real clock now. Other than simply printing it out,
> it is hard to think of a good way to use the value. Also, although
> it is unlikely, net_timestamp_check() does not always stamp the skb.
For non bpf ingress, hmmm.... yeah, not sure if it is indeed an issue :/
may be save the tx tstamp first and then temporarily restamp with __net_timestamp()
Powered by blists - more mailing lists