netdev - Re: [RFC PATCH net-next 2/2] net: Reset forwarded skb->tstamp before delivering to user space

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <83ff2f64-42b8-60ed-965a-810b4ec69f8d@iogearbox.net>
Date:   Tue, 7 Dec 2021 22:48:53 +0100
From:   Daniel Borkmann <daniel@...earbox.net>
To:     Willem de Bruijn <willemdebruijn.kernel@...il.com>,
        Martin KaFai Lau <kafai@...com>
Cc:     netdev@...r.kernel.org, Alexei Starovoitov <ast@...nel.org>,
        David Miller <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>, kernel-team@...com
Subject: Re: [RFC PATCH net-next 2/2] net: Reset forwarded skb->tstamp before
 delivering to user space

On 12/7/21 3:27 PM, Willem de Bruijn wrote:
> On Mon, Dec 6, 2021 at 9:01 PM Martin KaFai Lau <kafai@...com> wrote:
>>
>> The skb->tstamp may be set by a local sk (as a sender in tcp) which then
>> forwarded and delivered to another sk (as a receiver).
>>
>> An example:
>>      sender-sk => veth@...ns =====> veth@...t => receiver-sk
>>                               ^^^
>>                          __dev_forward_skb
>>
>> The skb->tstamp is marked with a future TX time.  This future
>> skb->tstamp will confuse the receiver-sk.
>>
>> This patch marks the skb if the skb->tstamp is forwarded.
>> Before using the skb->tstamp as a rx timestamp, it needs
>> to be re-stamped to avoid getting a future time.  It is
>> done in the RX timestamp reading helper skb_get_ktime().
>>
>> Signed-off-by: Martin KaFai Lau <kafai@...com>
>> ---
>>   include/linux/skbuff.h | 14 +++++++++-----
>>   net/core/dev.c         |  4 +++-
>>   net/core/skbuff.c      |  6 +++++-
>>   3 files changed, 17 insertions(+), 7 deletions(-)
>>
>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>> index b609bdc5398b..bc4ae34c4e22 100644
>> --- a/include/linux/skbuff.h
>> +++ b/include/linux/skbuff.h
>> @@ -867,6 +867,7 @@ struct sk_buff {
>>          __u8                    decrypted:1;
>>   #endif
>>          __u8                    slow_gro:1;
>> +       __u8                    fwd_tstamp:1;
>>
>>   #ifdef CONFIG_NET_SCHED
>>          __u16                   tc_index;       /* traffic control index */
>> @@ -3806,9 +3807,12 @@ static inline void skb_copy_to_linear_data_offset(struct sk_buff *skb,
>>   }
>>
>>   void skb_init(void);
>> +void net_timestamp_set(struct sk_buff *skb);
>>
>> -static inline ktime_t skb_get_ktime(const struct sk_buff *skb)
>> +static inline ktime_t skb_get_ktime(struct sk_buff *skb)
>>   {
>> +       if (unlikely(skb->fwd_tstamp))
>> +               net_timestamp_set(skb);
>>          return ktime_mono_to_real_cond(skb->tstamp);
> 
> This changes timestamp behavior for existing applications, probably
> worth mentioning in the commit message if nothing else. A timestamp
> taking at the time of the recv syscall is not very useful.
> 
> If a forwarded timestamp is not a future delivery time (as those are
> scrubbed), is it not correct to just deliver the original timestamp?
> It probably was taken at some earlier __netif_receive_skb_core.
> 
>>   }
>>
>> -static inline void net_timestamp_set(struct sk_buff *skb)
>> +void net_timestamp_set(struct sk_buff *skb)
>>   {
>>          skb->tstamp = 0;
>> +       skb->fwd_tstamp = 0;
>>          if (static_branch_unlikely(&netstamp_needed_key))
>>                  __net_timestamp(skb);
>>   }
>> +EXPORT_SYMBOL(net_timestamp_set);
>>
>>   #define net_timestamp_check(COND, SKB)                         \
>>          if (static_branch_unlikely(&netstamp_needed_key)) {     \
>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>> index f091c7807a9e..181ddc989ead 100644
>> --- a/net/core/skbuff.c
>> +++ b/net/core/skbuff.c
>> @@ -5295,8 +5295,12 @@ void skb_scrub_tstamp(struct sk_buff *skb)
>>   {
>>          struct sock *sk = skb->sk;
>>
>> -       if (sk && sk_fullsock(sk) && sock_flag(sk, SOCK_TXTIME))
>> +       if (sk && sk_fullsock(sk) && sock_flag(sk, SOCK_TXTIME)) {
> 
> There is a slight race here with the socket flipping the feature on/off.
> 
>>                  skb->tstamp = 0;
>> +               skb->fwd_tstamp = 0;
>> +       } else if (skb->tstamp) {
>> +               skb->fwd_tstamp = 1;
>> +       }
> 
> SO_TXTIME future delivery times are scrubbed, but TCP future delivery
> times are not?
> 
> If adding a bit, might it be simpler to add a bit tstamp_is_edt, and
> scrub based on that. That is also not open to the above race.

One other thing I wonder, BPF progs at host-facing veth's tc ingress which
are not aware of skb->tstamp will then see a tstamp from future given we
intentionally bypass the net_timestamp_check() and might get confused (or
would confuse higher-layer application logic)? Not quite sure yet if they
would be the only affected user.

With regards to open question on mono clock and time namespaces (which
cover mono + boottime offsets), looks like it seems not an issue as they
only affect syscall-facing APIs.

Thanks,
Daniel