[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ca728d81-80e8-3767-d5e-d44f6ad96e43@ssi.bg>
Date: Fri, 21 Jan 2022 14:02:23 +0200 (EET)
From: Julian Anastasov <ja@....bg>
To: Martin KaFai Lau <kafai@...com>
cc: bpf@...r.kernel.org, netdev@...r.kernel.org,
Alexei Starovoitov <ast@...nel.org>,
Andrii Nakryiko <andrii@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
David Miller <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, kernel-team@...com,
Willem de Bruijn <willemb@...gle.com>
Subject: Re: [RFC PATCH v3 net-next 3/4] net: Set skb->mono_delivery_time
and clear it when delivering locally
Hello,
On Thu, 20 Jan 2022, Martin KaFai Lau wrote:
> This patch sets the skb->mono_delivery_time to flag the skb->tstamp
> is used as the mono delivery_time (EDT) instead of the (rcv) timestamp.
>
> skb_clear_delivery_time() is added to clear the delivery_time and set
> back to the (rcv) timestamp if needed when the skb is being delivered
> locally (to a sk). skb_clear_delivery_time() is called in
> ip_local_deliver() and ip6_input(). In most of the regular ingress
> cases, the skb->tstamp should already have the (rcv) timestamp.
> For the egress loop back to ingress cases, the marking of the (rcv)
> timestamp is postponed from dev.c to ip_local_deliver() and
> ip6_input().
>
> Another case needs to clear the delivery_time is the network
> tapping (e.g. af_packet by tcpdump). Regardless of tapping at the ingress
> or egress, the tapped skb is received by the af_packet socket, so
> it is ingress to the af_packet socket and it expects
> the (rcv) timestamp.
>
> When tapping at egress, dev_queue_xmit_nit() is used. It has already
> expected skb->tstamp may have delivery_time, so it does
> skb_clone()+net_timestamp_set() to ensure the cloned skb has
> the (rcv) timestamp before passing to the af_packet sk.
> This patch only adds to clear the skb->mono_delivery_time
> bit in net_timestamp_set().
>
> When tapping at ingress, it currently expects the skb->tstamp is either 0
> or has the (rcv) timestamp. Meaning, the tapping at ingress path
> has already expected the skb->tstamp could be 0 and it will get
> the (rcv) timestamp by ktime_get_real() when needed.
>
> There are two cases for tapping at ingress:
>
> One case is af_packet queues the skb to its sk_receive_queue. The skb
> is either not shared or new clone created. The skb_clear_delivery_time()
> is called to clear the delivery_time (if any) before it is queued to the
> sk_receive_queue.
>
> Another case, the ingress skb is directly copied to the rx_ring
> and tpacket_get_timestamp() is used to get the (rcv) timestamp.
> skb_tstamp() is used in tpacket_get_timestamp() to check
> the skb->mono_delivery_time bit before returning skb->tstamp.
> As mentioned earlier, the tapping@...ress has already expected
> the skb may not have the (rcv) timestamp (because no sk has asked
> for it) and has handled this case by directly calling ktime_get_real().
>
> In __skb_tstamp_tx, it clones the egress skb and queues the clone to the
> sk_error_queue. The outgoing skb may have the mono delivery_time while
> the (rcv) timestamp is expected for the clone, so the
> skb->mono_delivery_time bit is also cleared from the clone.
>
> Signed-off-by: Martin KaFai Lau <kafai@...com>
> ---
> include/linux/skbuff.h | 27 +++++++++++++++++++++++++--
> net/core/dev.c | 4 +++-
> net/core/skbuff.c | 6 ++++--
> net/ipv4/ip_input.c | 1 +
> net/ipv6/ip6_input.c | 1 +
> net/packet/af_packet.c | 4 +++-
> 6 files changed, 37 insertions(+), 6 deletions(-)
>
> diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c
> index 3a025c011971..35311ca75496 100644
> --- a/net/ipv4/ip_input.c
> +++ b/net/ipv4/ip_input.c
> @@ -244,6 +244,7 @@ int ip_local_deliver(struct sk_buff *skb)
> */
> struct net *net = dev_net(skb->dev);
>
> + skb_clear_delivery_time(skb);
Is it safe to move this line into ip_local_deliver_finish ?
> if (ip_is_fragment(ip_hdr(skb))) {
> if (ip_defrag(net, skb, IP_DEFRAG_LOCAL_DELIVER))
> return 0;
> diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
> index 80256717868e..84f93864b774 100644
> --- a/net/ipv6/ip6_input.c
> +++ b/net/ipv6/ip6_input.c
> @@ -469,6 +469,7 @@ static int ip6_input_finish(struct net *net, struct sock *sk, struct sk_buff *sk
>
> int ip6_input(struct sk_buff *skb)
> {
> + skb_clear_delivery_time(skb);
Is it safe to move this line into ip6_input_finish?
The problem for both cases is that IPVS hooks at LOCAL_IN and
can decide to forward the packet by returning NF_STOLEN and
avoiding the _finish code. In short, before reaching the
_finish code it is still not decided that packet reaches the
sockets.
> return NF_HOOK(NFPROTO_IPV6, NF_INET_LOCAL_IN,
> dev_net(skb->dev), NULL, skb, skb->dev, NULL,
> ip6_input_finish);
Regards
--
Julian Anastasov <ja@....bg>
Powered by blists - more mailing lists