[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180106203739.GD2099@nanopsycho>
Date: Sat, 6 Jan 2018 21:37:39 +0100
From: Jiri Pirko <jiri@...nulli.us>
To: David Ahern <dsahern@...il.com>
Cc: netdev@...r.kernel.org, davem@...emloft.net, jhs@...atatu.com,
xiyou.wangcong@...il.com, mlxsw@...lanox.com, andrew@...n.ch,
vivien.didelot@...oirfairelinux.com, f.fainelli@...il.com,
michael.chan@...adcom.com, ganeshgr@...lsio.com,
saeedm@...lanox.com, matanb@...lanox.com, leonro@...lanox.com,
idosch@...lanox.com, jakub.kicinski@...ronome.com,
simon.horman@...ronome.com, pieter.jansenvanvuuren@...ronome.com,
john.hurley@...ronome.com, alexander.h.duyck@...el.com,
ogerlitz@...lanox.com, john.fastabend@...il.com,
daniel@...earbox.net
Subject: Re: [patch net-next v6 00/11] net: sched: allow qdiscs to share
filter block instances
Sat, Jan 06, 2018 at 06:41:18PM CET, dsahern@...il.com wrote:
>On 1/6/18 1:07 AM, Jiri Pirko wrote:
>> Sat, Jan 06, 2018 at 04:57:21AM CET, dsahern@...il.com wrote:
>>> On 1/5/18 4:09 PM, Jiri Pirko wrote:
>>>> From: Jiri Pirko <jiri@...lanox.com>
>>>>
>>>> Currently the filters added to qdiscs are independent. So for example if you
>>>> have 2 netdevices and you create ingress qdisc on both and you want to add
>>>> identical filter rules both, you need to add them twice. This patchset
>>>> makes this easier and mainly saves resources allowing to share all filters
>>>> within a qdisc - I call it a "filter block". Also this helps to save
>>>> resources when we do offload to hw for example to expensive TCAM.
>>>>
>>>> So back to the example. First, we create 2 qdiscs. Both will share
>>>> block number 22. "22" is just an identification. If we don't pass any
>>>> block number, a new one will be generated by kernel:
>>>>
>>>> $ tc qdisc add dev ens7 ingress block 22
>>>> ^^^^^^^^
>>>> $ tc qdisc add dev ens8 ingress block 22
>>>> ^^^^^^^^
>>>>
>>>> Now if we list the qdiscs, we will see the block index in the output:
>>>>
>>>> $ tc qdisc
>>>> qdisc ingress ffff: dev ens7 parent ffff:fff1 block 22
>>>> qdisc ingress ffff: dev ens8 parent ffff:fff1 block 22
>>>>
>>>>
>>>> To make is more visual, the situation looks like this:
>>>>
>>>> ens7 ingress qdisc ens7 ingress qdisc
>>>> | |
>>>> | |
>>>> +----------> block 22 <----------+
>>>>
>>>> Unlimited number of qdiscs may share the same block.
>>>>
>>>> Now we can add filter using the block index:
>>>>
>>>> $ tc filter add block 22 protocol ip pref 25 flower dst_ip 192.168.0.0/16 action drop
>>>>
>>>>
>>>> Note we cannot use the qdisc for filter manipulations for shared blocks:
>>>>
>>>> $ tc filter add dev ens8 ingress protocol ip pref 1 flower dst_ip 192.168.100.2 action drop
>>>> Error: Cannot work with shared block, please use block index.
>>>>
>>>>
>>>> We will see the same output if we list filters for ingress qdisc of
>>>> ens7 and ens8, also for the block 22:
>>>>
>>>> $ tc filter show block 22
>>>> filter block 22 protocol ip pref 25 flower chain 0
>>>> filter block 22 protocol ip pref 25 flower chain 0 handle 0x1
>>>> ...
>>>>
>>>> $ tc filter show dev ens7 ingress
>>>> filter block 22 protocol ip pref 25 flower chain 0
>>>> filter block 22 protocol ip pref 25 flower chain 0 handle 0x1
>>>> ...
>>>>
>>>> $ tc filter show dev ens8 ingress
>>>> filter block 22 protocol ip pref 25 flower chain 0
>>>> filter block 22 protocol ip pref 25 flower chain 0 handle 0x1
>>>> ...
>>>
>>> I like the API and output shown here, but I am not getting that with the
>>> patches.
>>>
>>> In this example, I am using 42 for the block id:
>>>
>>> $ tc qdisc show dev eth2
>>> qdisc mq 0: root
>>> qdisc pfifo_fast 0: parent :2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1
>>> 1 1 1
>>> qdisc pfifo_fast 0: parent :1 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1
>>> 1 1 1
>>> qdisc ingress ffff: parent ffff:fff1 block 42
>>>
>>> It allows me to add a filter using the device:
>>> $ tc filter add dev eth2 ingress protocol ip pref 1 flower dst_ip
>>> 192.168.101.2 action drop
>>> $ echo $?
>>> 0
>>
>> Yes, because the block is not shared yet. You have it only for one
>> qdisc. As long as you have that, the "filter add dev" api still works.
>> It stops working when you add another qdisc to that block.
>
>Interesting.
>
>Once I add the block to another qdisc I do get an error:
>
>$ tc filter add dev eth2 ingress protocol ip pref 1 flower dst_ip
>192.168.100.2 action drop
>Error: Cannot work with shared block, please use block index.
>
>Can you change that to something like: "This filter block is shared.
>Please use the block index to make changes."
Ok. Sounds reasonable.
>
>
>>
>>
>>>
>>> And it modifies the shared block:
>>> $ tc filter show block 42
>>> filter pref 1 flower chain 0
>>> filter pref 1 flower chain 0 handle 0x1
>>> eth_type ipv4
>>> dst_ip 192.168.100.2
>>> not_in_hw
>>> action order 1: gact action drop
>>> random type none pass val 0
>>> index 2 ref 1 bind 1
>>>
>>> filter pref 1 flower chain 0 handle 0x2
>>> eth_type ipv4
>>> dst_ip 192.168.101.2
>>> not_in_hw
>>> action order 1: gact action drop
>>> random type none pass val 0
>>> index 3 ref 1 bind 1
>>>
>>> filter pref 25 flower chain 0
>>> filter pref 25 flower chain 0 handle 0x1
>>> eth_type ipv4
>>> dst_ip 192.168.0.0/16
>>> not_in_hw
>>> action order 1: gact action drop
>>> random type none pass val 0
>>> index 1 ref 1 bind 1
>>>
>>> Notice the header does not give the 'filter block N protocol' part. I
>>> don't get that using the device either (tc filter show dev eth2 ingress).
>>
>> That is correct. Check the print_filter function in tc/tc_filter.c. It
>> works with "filter_ifindex" and with my patch with "filter_block_index".
>> That means that if the value for the filter dumped actually differs from
>> what you passed on the command line, it prints it.
>>
>> Once you actually share the block with another qdisc, you will see
>> "block N"
>>
>>
>>>
>>> Something else I noticed is that I do not get an error message if I pass
>>> an invalid block id:
>>>
>>> $ tc filter show block 22
>>> $ echo $?
>>> 0
>>> $ tc qdisc show | grep block
>>> qdisc ingress ffff: dev eth2 parent ffff:fff1 block 42
>>
>> Yeah, I will try to fix this. The thing is, this is not error by kernel
>> but by the userspace. Kernel is perfectly ok with invalid device or
>> block index, it just does not dump anything and I would leave it like
>> that. I have to somehow check the validity of block_index in userspace.
>> Not sure how now.
>
>Ok. I saw a response about idr_alloc_ext.
I have idea having new type of message:
rtnl_register(PF_UNSPEC, RTM_GETBLOCK, tc_getblock_doit,
tc_getblock_dumpit, 0);
Also, block creation would send RTM_NEWBLOCK and block deletion would
send RTM_DELBLOCK. Not needed now, but I guess it would be nice to have
notifications.
>
>Here's another one: adding a filter to an unknown block id:
>
>$ tc filter add block 66 ingress protocol ip pref 1 flower dst_ip
>192.168.100.2 action drop
>RTNETLINK answers: Invalid argument
>We have an error talking to the kernel
>
>
>Can you add a proper extack message for that case?
Yeah, I will. There is couple of other places in the neibour code that
need extact (like qdisq lookup fail). But those I will address in a
separate patchset.
>
>Thanks,
Powered by blists - more mailing lists