[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ca11f6f6-86f9-52c9-4251-90bf0b6f588a@iogearbox.net>
Date: Fri, 17 Dec 2021 00:42:30 +0100
From: Daniel Borkmann <daniel@...earbox.net>
To: Willem de Bruijn <willemdebruijn.kernel@...il.com>,
Martin KaFai Lau <kafai@...com>
Cc: netdev@...r.kernel.org, Alexei Starovoitov <ast@...nel.org>,
David Miller <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, kernel-team@...com
Subject: Re: [RFC PATCH v2 net-next] net: Preserve skb delivery time during
forward
On 12/16/21 11:58 PM, Willem de Bruijn wrote:
>>>> @@ -530,7 +538,14 @@ struct skb_shared_info {
>>>> /* Warning: this field is not always filled in (UFO)! */
>>>> unsigned short gso_segs;
>>>> struct sk_buff *frag_list;
>>>> - struct skb_shared_hwtstamps hwtstamps;
>>>> + union {
>>>> + /* If SKBTX_DELIVERY_TSTAMP is set in tx_flags,
>>>> + * tx_delivery_tstamp is stored instead of
>>>> + * hwtstamps.
>>>> + */
>>>
>>> Should we just encode the timebase and/or type { timestamp,
>>> delivery_time } in th lower bits of the timestamp field? Its
>>> resolution is higher than actual clock precision.
>> In skb->tstamp ?
>
> Yes. Arguably a hack, but those bits are in the noise now, and it
> avoids the clone issue with skb_shinfo (and scarcity of flag bits
> there).
>
>>> is non-zero skb->tstamp test not sufficient, instead of
>>> SKBTX_DELIVERY_TSTAMP_ALLOW_FWD.
>>>
>>> It is if only called on the egress path. Is bpf on ingress the only
>>> reason for this?
>> Ah. ic. meaning testing non-zero skb->tstamp and then call
>> skb_save_delivery_time() only during the veth-egress-path:
>> somewhere in veth_xmit() => veth_forward_skb() but before
>> skb->tstamp was reset to 0 in __dev_forward_skb().
>
> Right. If delivery_time is the only use of skb->tstamp on egress, and
> timestamp is the only use on ingress, then the only time the
> delivery_time needs to be cached if when looping from egress to
> ingress and this field is non-zero.
>
>> Keep *_forward() and bpf_out_*() unchanged (i.e. keep skb->tstamp = 0)
>> because the skb->tstamp could be stamped by net_timestamp_check().
>>
>> Then SKBTX_DELIVERY_TSTAMP_ALLOW_FWD is not needed.
>>
>> Did I understand your suggestion correctly?
>
> I think so.
>
> But the reality is complicated if something may be setting a delivery
> time on ingress (a BPF filter?)
I'm not quite following the 'bpf_out_*() unchanged (i.e. keep skb->tstamp = 0)'
part yet; in our case we would need to preserve it as well, for example, we are
redirecting via bpf from bpf@...ingress@...t-veth to bpf@...egress@...s-dev in
the egress path and fq sits on phys-dev.. (I mean if needed we could easily do
that as shown in my prev diff with a flag for the helper).
>> However, we still need a bit to distinguish tx_delivery_tstamp
>> from hwtstamps.
>>
>>>
>>>> +{
>>>> + if (skb_shinfo(skb)->tx_flags & SKBTX_DELIVERY_TSTAMP_ALLOW_FWD) {
>>>> + skb_shinfo(skb)->tx_delivery_tstamp = skb->tstamp;
>>>> + skb_shinfo(skb)->tx_flags |= SKBTX_DELIVERY_TSTAMP;
>>>> + skb_shinfo(skb)->tx_flags &= ~SKBTX_DELIVERY_TSTAMP_ALLOW_FWD;
>>>> + }
>>>
>>> Is this only called when there are no clones/shares?
>> No, I don't think so. TCP clone it. I also started thinking about
>> this after noticing a mistake in the change in __tcp_transmit_skb().
>>
>> There are other places that change tx_flags, e.g. tcp_offload.c.
>> It is not shared at those places or there is some specific points
>> in the stack that is safe to change ?
>
> The packet probably is not yet shared. Until the TCP stack gives a
> packet to the IP layer, it can treat it as exclusive.
>
> Though it does seem that these fields are accessed in a possibly racy
> manner. Drivers with hardware tx timestamp offload may set
> skb_shinfo(orig_skb)->tx_flags & SKBTX_IN_PROGRESS without checking
> whether the skb may be cloned.
Powered by blists - more mailing lists