netdev - Re: [PATCH net-next v5 5/6] flow_offload: support get flow

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <bac5c6a5-8a1b-ee74-988b-6c2a71885761@ucloud.cn>
Date:   Fri, 2 Aug 2019 18:45:04 +0800
From:   wenxu <wenxu@...oud.cn>
To:     Jakub Kicinski <jakub.kicinski@...ronome.com>
Cc:     jiri@...nulli.us, pablo@...filter.org, fw@...len.de,
        netfilter-devel@...r.kernel.org, netdev@...r.kernel.org,
        John Hurley <john.hurley@...ronome.com>
Subject: Re: [PATCH net-next v5 5/6] flow_offload: support get flow_block
 immediately


On 8/2/2019 7:11 AM, Jakub Kicinski wrote:
> On Thu,  1 Aug 2019 11:03:46 +0800, wenxu@...oud.cn wrote:
>> From: wenxu <wenxu@...oud.cn>
>>
>> The new flow-indr-block can't get the tcf_block
>> directly. It provide a callback list to find the flow_block immediately
>> when the device register and contain a ingress block.
>>
>> Signed-off-by: wenxu <wenxu@...oud.cn>
> First of all thanks for splitting the series up into more patches, 
> it is easier to follow the logic now!
>
>> @@ -328,6 +348,7 @@ struct flow_indr_block_dev {
>>  
>>  	INIT_LIST_HEAD(&indr_dev->cb_list);
>>  	indr_dev->dev = dev;
>> +	flow_get_default_block(indr_dev);
>>  	if (rhashtable_insert_fast(&indr_setup_block_ht, &indr_dev->ht_node,
>>  				   flow_indr_setup_block_ht_params)) {
>>  		kfree(indr_dev);
> I wonder if it's still practical to keep the block information in the
> indr_dev structure at all. The way this used to work was:
>
>
> [hash table of devices]     --------------
>                  |         |    netdev    |
>                  |         |    refcnt    |
>   indir_dev[tun0]|  ------ | cached block | ---- [ TC block ]
>                  |         |   callbacks  | .
>                  |          --------------   \__ [cb, cb_priv, cb_ident]
>                  |                               [cb, cb_priv, cb_ident]
>                  |          --------------
>                  |         |    netdev    |
>                  |         |    refcnt    |
>   indir_dev[tun1]|  ------ | cached block | ---- [ TC block ]
>                  |         |   callbacks  |.
> -----------------           --------------   \__ [cb, cb_priv, cb_ident]
>                                                  [cb, cb_priv, cb_ident]
>
>
> In the example above we have two tunnels tun0 and tun1, each one has a
> indr_dev structure allocated, and for each one of them two drivers
> registered for callbacks (hence the callbacks list has two entries).
>
> We used to cache the TC block in the indr_dev structure, but now that
> there are multiple subsytems using the indr_dev we either have to have
> a list of cached blocks (with entries for each subsystem) or just always
> iterate over the subsystems :(
>
> After all the same device may have both a TC block and a NFT block.
>
> I think always iterating would be easier:
>
> The indr_dev struct would no longer have the block pointer, instead
> when new driver registers for the callback instead of:
>
> 	if (indr_dev->ing_cmd_cb)
> 		indr_dev->ing_cmd_cb(indr_dev->dev, indr_dev->flow_block,
> 				     indr_block_cb->cb, indr_block_cb->cb_priv,
> 				     FLOW_BLOCK_BIND);
>
> We'd have something like the loop in flow_get_default_block():
>
> 	for each (subsystem)
> 		subsystem->handle_new_indir_cb(indr_dev, cb);
>
> And then per-subsystem logic would actually call the cb. Or:
>
> 	for each (subsystem)
> 		block = get_default_block(indir_dev)
> 		indr_dev->ing_cmd_cb(...)

            nft dev chian is also based on register_netdevice_notifier, So for unregister case,

the basechian(block) of nft maybe delete before the __tc_indr_block_cb_unregister. is right?

So maybe we can cache the block as a list of all the subsystem in  indr_dev ?

> I hope this makes sense.
>
>
> Also please double check nft unload logic has an RCU synchronization
> point, I'm not 100% confident rcu_barrier() implies synchronize_rcu().
> Perhaps someone more knowledgeable can chime in :)
>