netdev - Re: [patch net-next v3 05/10] net: sched: keep track of offloaded filters and check tc offload feature

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171214131045.GD1926@nanopsycho>
Date:   Thu, 14 Dec 2017 14:10:45 +0100
From:   Jiri Pirko <jiri@...nulli.us>
To:     Jakub Kicinski <jakub.kicinski@...ronome.com>
Cc:     netdev@...r.kernel.org, davem@...emloft.net, jhs@...atatu.com,
        xiyou.wangcong@...il.com, mlxsw@...lanox.com, andrew@...n.ch,
        vivien.didelot@...oirfairelinux.com, f.fainelli@...il.com,
        michael.chan@...adcom.com, ganeshgr@...lsio.com,
        saeedm@...lanox.com, matanb@...lanox.com, leonro@...lanox.com,
        idosch@...lanox.com, simon.horman@...ronome.com,
        pieter.jansenvanvuuren@...ronome.com, john.hurley@...ronome.com,
        alexander.h.duyck@...el.com, ogerlitz@...lanox.com,
        john.fastabend@...il.com, daniel@...earbox.net
Subject: Re: [patch net-next v3 05/10] net: sched: keep track of offloaded
 filters and check tc offload feature

Thu, Dec 14, 2017 at 10:47:16AM CET, jiri@...nulli.us wrote:
>Thu, Dec 14, 2017 at 02:05:55AM CET, jakub.kicinski@...ronome.com wrote:
>>On Wed, 13 Dec 2017 16:10:33 +0100, Jiri Pirko wrote:
>>> From: Jiri Pirko <jiri@...lanox.com>
>>> 
>>> During block bind, we need to check tc offload feature. If it is
>>> disabled yet still the block contains offloaded filters, forbid the
>>> bind. Also forbid to register callback for a block that already
>>> containes offloaded filters, as the play back is not supported now.
>>> For keeping track of offloaded filters there is a new counter
>>> introduced, alongside with couple of helpers called from cls_* code.
>>> These helpers set and clear TCA_CLS_FLAGS_IN_HW flag.
>>> 
>>> Signed-off-by: Jiri Pirko <jiri@...lanox.com>
>>
>>> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
>>> index de9dbcb..ac25142 100644
>>> --- a/net/sched/cls_api.c
>>> +++ b/net/sched/cls_api.c
>>> @@ -266,31 +266,50 @@ void tcf_chain_put(struct tcf_chain *chain)
>>>  }
>>>  EXPORT_SYMBOL(tcf_chain_put);
>>>  
>>> -static void tcf_block_offload_cmd(struct tcf_block *block, struct Qdisc *q,
>>> +static bool tcf_block_offload_in_use(struct tcf_block *block)
>>> +{
>>> +	return block->offloadcnt;
>>> +}
>>> +
>>> +static void tcf_block_offload_cmd(struct tcf_block *block,
>>> +				  struct net_device *dev,
>>>  				  struct tcf_block_ext_info *ei,
>>>  				  enum tc_block_command command)
>>>  {
>>> -	struct net_device *dev = q->dev_queue->dev;
>>>  	struct tc_block_offload bo = {};
>>>  
>>> -	if (!dev->netdev_ops->ndo_setup_tc)
>>> -		return;
>>>  	bo.command = command;
>>>  	bo.binder_type = ei->binder_type;
>>>  	bo.block = block;
>>>  	dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_BLOCK, &bo);
>>>  }
>>>  
>>> -static void tcf_block_offload_bind(struct tcf_block *block, struct Qdisc *q,
>>> -				   struct tcf_block_ext_info *ei)
>>> +static int tcf_block_offload_bind(struct tcf_block *block, struct Qdisc *q,
>>> +				  struct tcf_block_ext_info *ei)
>>>  {
>>> -	tcf_block_offload_cmd(block, q, ei, TC_BLOCK_BIND);
>>> +	struct net_device *dev = q->dev_queue->dev;
>>> +
>>> +	if (!dev->netdev_ops->ndo_setup_tc)
>>> +		return 0;
>>> +
>>> +	/* If tc offload feature is disabled and the block we try to bind
>>> +	 * to already has some offloaded filters, forbid to bind.
>>> +	 */
>>> +	if (!tc_can_offload(dev) && tcf_block_offload_in_use(block))
>>> +		return -EOPNOTSUPP;
>>
>>I don't understand the tc_can_offload(dev) check here.  The flow is -
>>on bind if TC offloads are enabled the new port will get a TC_BLOCK_BIND
>>and request a register, but the register will fail since the block is
>>offloaded?
>
>The point of this check is to disallow dev with tc offload disabled to
>share a block which already holds offloaded filters.
>
>That is similar to disallow disabling tc offload on device that shares a
>block which contains offloaded filters.
>
>
>
>>
>>But the whole bind operation does not fail, so user will not see an
>>error.  The block will get bound but port's driver has no way to
>>register the callback.  I'm sorry if I'm just being slow here..
>>
>>> +	tcf_block_offload_cmd(block, dev, ei, TC_BLOCK_BIND);
>>> +	return 0;
>
>The thing is that driver which does not support TC_BLOCK_BIND would
>return -EOPNOTSUPP here. For those drivers we continue, they just won't
>have block cb registered so they won't receive cb calls to offload
>filters.
>
>
>>>  }
>>>  
>>>  static void tcf_block_offload_unbind(struct tcf_block *block, struct Qdisc *q,
>>>  				     struct tcf_block_ext_info *ei)
>>>  {
>>> -	tcf_block_offload_cmd(block, q, ei, TC_BLOCK_UNBIND);
>>> +	struct net_device *dev = q->dev_queue->dev;
>>> +
>>> +	if (!dev->netdev_ops->ndo_setup_tc)
>>> +		return;
>>> +	tcf_block_offload_cmd(block, dev, ei, TC_BLOCK_UNBIND);
>>>  }
>>>  
>>>  static int
>>> @@ -499,10 +518,15 @@ int tcf_block_get_ext(struct tcf_block **p_block, struct Qdisc *q,
>>>  	if (err)
>>>  		goto err_chain_head_change_cb_add;
>>>  
>>> -	tcf_block_offload_bind(block, q, ei);
>>> +	err = tcf_block_offload_bind(block, q, ei);
>>> +	if (err)
>>> +		goto err_block_offload_bind;
>>
>>Would it perhaps make more sense to add a TC_BLOCK_JOIN or some such?
>
>Why? Just a namechange?
>
>
>>IIUC the problem is we don't know whether the driver/callee of the new
>>port is aware of previous callbacks/filters and we can't replay them.

Well, the problem is a bit different.
There are 2 scenarios when we need to fail here:
1) tc offload feature is turned off, there are some filters offloaded in
   the block. That is what I commented above.
2) tc offload feature is turned on, there are some filters offloaded in
   the block but the block is not accounted by the driver. This is
   because of the lack or replay. This is taken care of in the beginning
   of __tcf_block_cb_register function - see below, there is a comment
   there.


>>
>>Obviously registering new callbacks on offloaded blocks is a no-go.
>>For simple bind to a new port of an ASIC which already knows the rule
>>set, we just need to ask all callbacks if they know the port and as
>>long as any of them responds "yes" we are safe to assume the bind is OK.
>>
>>(Existing drivers will all respond with EOPNOTSUPP to a new unknown command.)
>>
>>Does that make sense?
>
>Hmm, I understand what you say. I have to think about that a bit more.

As you see, both cases where we need to bail out are covered. Do you see
any other problem?


>
>Thanks!
>
>>
>>>  	*p_block = block;
>>>  	return 0;
>>>  
>>> +err_block_offload_bind:
>>> +	tcf_chain_head_change_cb_del(tcf_block_chain_zero(block), ei);
>>>  err_chain_head_change_cb_add:
>>>  	tcf_block_owner_del(block, q, ei->binder_type);
>>>  err_block_owner_add:
>>> @@ -630,9 +654,16 @@ struct tcf_block_cb *__tcf_block_cb_register(struct tcf_block *block,
>>>  {
>>>  	struct tcf_block_cb *block_cb;
>>>  
>>> +	/* At this point, playback of previous block cb calls is not supported,
>>> +	 * so forbid to register to block which already has some offloaded
>>> +	 * filters present.
>>> +	 */
>>> +	if (tcf_block_offload_in_use(block))
>>> +		return ERR_PTR(-EOPNOTSUPP);
>>> 
>>>  	block_cb = kzalloc(sizeof(*block_cb), GFP_KERNEL);
>>>  	if (!block_cb)
>>> -		return NULL;
>>> +		return ERR_PTR(-ENOMEM);
>>>  	block_cb->cb = cb;
>>>  	block_cb->cb_ident = cb_ident;
>>>  	block_cb->cb_priv = cb_priv;
>>> @@ -648,7 +679,7 @@ int tcf_block_cb_register(struct tcf_block *block,
>>>  	struct tcf_block_cb *block_cb;
>>>  
>>>  	block_cb = __tcf_block_cb_register(block, cb, cb_ident, cb_priv);
>>> -	return block_cb ? 0 : -ENOMEM;
>>> +	return IS_ERR(block_cb) ? PTR_ERR(block_cb) : 0;
>>>  }
>>>  EXPORT_SYMBOL(tcf_block_cb_register);
>>>