[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <46920c3f-564d-e9ef-f714-22f96239736d@gmail.com>
Date: Thu, 13 Dec 2018 02:18:29 -0800
From: Eric Dumazet <eric.dumazet@...il.com>
To: Florian Westphal <fw@...len.de>
Cc: netdev@...r.kernel.org, cpaasch@...le.com, peter.krystad@...el.com,
mathew.j.martineau@...ux.intel.com
Subject: Re: [PATCH net-next 02/13] sk_buff: add skb extension infrastructure
On 12/13/2018 01:27 AM, Florian Westphal wrote:
> Eric Dumazet <eric.dumazet@...il.com> wrote:
>
> [ CC Christoph, Mathew, Peter ]
>
>>> If not, I will send another iteration that just allocates the entire
>>> extension space if first extension is added, it simplifies allocation
>>> handling a little.
>>>
>>
>> I am still unsure of the added extra costs, but for a start, TCP xmit clones
>> the skbs.
>
> Yes.
>
>> Does it mean an additional pair of atomic operations if the skb comes from MPTCP enabled flow ?
>
> Depends.
>
> If its going to be used as I expect, then the extension could be
> discarded after the DSS mapping has been written to the tcp option
> space, i.e. before cloning occurs.
I do not see how this would work, without also discarding on the master skb
the needed info.
>
> So skb_clone would not cause a refcount increase, as the skb
> does not have extension (anymore) by that time.
>
> If its presevered because it has to be used again
> after the cloning, then you are right, and we get refcont_inc() cost.
>
> As for the freeing, this currently has an atomic op:
>
> static inline void __skb_ext_put(struct skb_ext *ext)
> {
> if (ext && refcount_dec_and_test(&ext->refcnt))
> __skb_ext_free(ext);
> }
>
> static inline void skb_ext_put(struct sk_buff *skb)
> {
> if (skb->active_extensions)
> __skb_ext_put(skb->extensions);
> }
>
> I think this could be avoided by using same trick as
> fclones->fclone_ref one in kfree_skbmem():
>
> static inline void skb_ext_put(struct sk_buff *skb)
> {
> if (skb->active_extensions)
> __skb_ext_put(skb->extensions);
> }
>
> void __skb_ext_put(struct skb_ext *ext)
> {
> if (refcount_read(&ext->refcnt) == 1)
> goto free_now;
>
> if (!refcount_dec_and_test(&ext->refcnt))
> return;
> free_now:
> #ifdef CONFIG_XFRM
> if (__skb_ext_exist(ext, SKB_EXT_SEC_PATH))
> skb_ext_put_sp(skb_ext_get_ptr(ext, SKB_EXT_SEC_PATH));
> #endif
> kfree(ext);
> }
>
> I think this could work, if refcount is 1, then there is no clone, i.e.
> nothing else can inc/dec reference after the "== 1" check is true.
>
> I will change it to above version.
>
>> The clone is used in the lower layers (IP , qdisc, device xmit), so I do not see why a clone
>> should share the extdata from the master.
>
> For TCP, thats true. But there are other places that could clone, e.g.
> when bridge has to flood-forward.
>
So you propose a mechanism that forces a preserve on clone, base on existing needs
for bridging.
> At least in bridge case the 'preseve on clone' is needed, else required
> information is missing from the cloned skb.
>
We need something where MPTCP info does not need to be propagated all the way to the NIC...
This skb extension is an incentive for adding more sticky things in the skbs
to violate layering of networking stacks :/
Powered by blists - more mailing lists