[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <83ff2f64-42b8-60ed-965a-810b4ec69f8d@iogearbox.net>
Date: Tue, 7 Dec 2021 22:48:53 +0100
From: Daniel Borkmann <daniel@...earbox.net>
To: Willem de Bruijn <willemdebruijn.kernel@...il.com>,
Martin KaFai Lau <kafai@...com>
Cc: netdev@...r.kernel.org, Alexei Starovoitov <ast@...nel.org>,
David Miller <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, kernel-team@...com
Subject: Re: [RFC PATCH net-next 2/2] net: Reset forwarded skb->tstamp before
delivering to user space
On 12/7/21 3:27 PM, Willem de Bruijn wrote:
> On Mon, Dec 6, 2021 at 9:01 PM Martin KaFai Lau <kafai@...com> wrote:
>>
>> The skb->tstamp may be set by a local sk (as a sender in tcp) which then
>> forwarded and delivered to another sk (as a receiver).
>>
>> An example:
>> sender-sk => veth@...ns =====> veth@...t => receiver-sk
>> ^^^
>> __dev_forward_skb
>>
>> The skb->tstamp is marked with a future TX time. This future
>> skb->tstamp will confuse the receiver-sk.
>>
>> This patch marks the skb if the skb->tstamp is forwarded.
>> Before using the skb->tstamp as a rx timestamp, it needs
>> to be re-stamped to avoid getting a future time. It is
>> done in the RX timestamp reading helper skb_get_ktime().
>>
>> Signed-off-by: Martin KaFai Lau <kafai@...com>
>> ---
>> include/linux/skbuff.h | 14 +++++++++-----
>> net/core/dev.c | 4 +++-
>> net/core/skbuff.c | 6 +++++-
>> 3 files changed, 17 insertions(+), 7 deletions(-)
>>
>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>> index b609bdc5398b..bc4ae34c4e22 100644
>> --- a/include/linux/skbuff.h
>> +++ b/include/linux/skbuff.h
>> @@ -867,6 +867,7 @@ struct sk_buff {
>> __u8 decrypted:1;
>> #endif
>> __u8 slow_gro:1;
>> + __u8 fwd_tstamp:1;
>>
>> #ifdef CONFIG_NET_SCHED
>> __u16 tc_index; /* traffic control index */
>> @@ -3806,9 +3807,12 @@ static inline void skb_copy_to_linear_data_offset(struct sk_buff *skb,
>> }
>>
>> void skb_init(void);
>> +void net_timestamp_set(struct sk_buff *skb);
>>
>> -static inline ktime_t skb_get_ktime(const struct sk_buff *skb)
>> +static inline ktime_t skb_get_ktime(struct sk_buff *skb)
>> {
>> + if (unlikely(skb->fwd_tstamp))
>> + net_timestamp_set(skb);
>> return ktime_mono_to_real_cond(skb->tstamp);
>
> This changes timestamp behavior for existing applications, probably
> worth mentioning in the commit message if nothing else. A timestamp
> taking at the time of the recv syscall is not very useful.
>
> If a forwarded timestamp is not a future delivery time (as those are
> scrubbed), is it not correct to just deliver the original timestamp?
> It probably was taken at some earlier __netif_receive_skb_core.
>
>> }
>>
>> -static inline void net_timestamp_set(struct sk_buff *skb)
>> +void net_timestamp_set(struct sk_buff *skb)
>> {
>> skb->tstamp = 0;
>> + skb->fwd_tstamp = 0;
>> if (static_branch_unlikely(&netstamp_needed_key))
>> __net_timestamp(skb);
>> }
>> +EXPORT_SYMBOL(net_timestamp_set);
>>
>> #define net_timestamp_check(COND, SKB) \
>> if (static_branch_unlikely(&netstamp_needed_key)) { \
>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>> index f091c7807a9e..181ddc989ead 100644
>> --- a/net/core/skbuff.c
>> +++ b/net/core/skbuff.c
>> @@ -5295,8 +5295,12 @@ void skb_scrub_tstamp(struct sk_buff *skb)
>> {
>> struct sock *sk = skb->sk;
>>
>> - if (sk && sk_fullsock(sk) && sock_flag(sk, SOCK_TXTIME))
>> + if (sk && sk_fullsock(sk) && sock_flag(sk, SOCK_TXTIME)) {
>
> There is a slight race here with the socket flipping the feature on/off.
>
>> skb->tstamp = 0;
>> + skb->fwd_tstamp = 0;
>> + } else if (skb->tstamp) {
>> + skb->fwd_tstamp = 1;
>> + }
>
> SO_TXTIME future delivery times are scrubbed, but TCP future delivery
> times are not?
>
> If adding a bit, might it be simpler to add a bit tstamp_is_edt, and
> scrub based on that. That is also not open to the above race.
One other thing I wonder, BPF progs at host-facing veth's tc ingress which
are not aware of skb->tstamp will then see a tstamp from future given we
intentionally bypass the net_timestamp_check() and might get confused (or
would confuse higher-layer application logic)? Not quite sure yet if they
would be the only affected user.
With regards to open question on mono clock and time namespaces (which
cover mono + boottime offsets), looks like it seems not an issue as they
only affect syscall-facing APIs.
Thanks,
Daniel
Powered by blists - more mailing lists