[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4f99e2b6-0f09-9d2c-6300-dfc884d501a8@mellanox.com>
Date: Tue, 24 Sep 2019 11:48:53 +0000
From: Paul Blakey <paulb@...lanox.com>
To: Edward Cree <ecree@...arflare.com>,
Jakub Kicinski <jakub.kicinski@...ronome.com>
CC: Pravin Shelar <pshelar@....org>,
Daniel Borkmann <daniel@...earbox.net>,
Vlad Buslov <vladbu@...lanox.com>,
David Miller <davem@...emloft.net>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
Jiri Pirko <jiri@...nulli.us>,
Cong Wang <xiyou.wangcong@...il.com>,
Jamal Hadi Salim <jhs@...atatu.com>,
Simon Horman <simon.horman@...ronome.com>,
Or Gerlitz <gerlitz.or@...il.com>
Subject: Re: CONFIG_NET_TC_SKB_EXT
On 9/23/2019 8:17 PM, Edward Cree wrote:
> On 23/09/2019 17:56, Paul Blakey wrote:
>> Even following this approach in tc only is challenging for some
>> scenarios, consider the following tc rules:
>>
>> tc filter add dev1 ... chain 0 flower <match1> action goto chain 1
>> tc filter add dev1 ... chain 0 flower <match2> action goto chain 1
>> tc filter add dev1 ... chain 0 flower <match3> action goto chain 1
>> ..
>> ..
>> tc filter add dev1 ... chain 0 flower ip_dst <match1000> action goto chain 1
>>
>> tc filter add dev1 ... chain 1 flower ip_proto tcp action tunnel_key set dst_ip 8.8.8.8 action mirred egress redirect dev2
> This one is easy, if a packet gets to the end of a chain without matching any rules then it just gets delivered to the uplink (PF), and software TC starts over from the beginning of chain 0. AFAICT this is the natural hardware behaviour for any design of offload, and it's inherently all-or-nothing.
>> You'd also have to keep track of what fields were originally on the packet, and not a match resulting from a modifications,
>> See the following set of rules:
>> tc filter add dev1 ... chain 0 prio 1 flower src_mac <mac1> action pedit munge ex set dst mac <mac2> pipe action goto chain 1
>> tc filter add dev1 ... chain 0 prio 2 flower dst_mac <mac2> action goto chain 1
>> tc filter add dev1 ... chain 1 prio 1 flower dst_mac <mac3> action <action3>
>> tc filter add dev1 ... chain 1 prio 2 flower src_mac <mac1> action <action4>
> This one is slightly harder in that you have to either essentially 'gather up' actions as you go and only 'commit' them at the end of your processing pipeline (in which case you have to deal with the data hazard you cleverly included in your example), or keep a copy of the original packet around. But I don't see the data-hazard case as having any realistic use-cases (does anyone actually put actions other than ct before a goto chain?) and it's easy enough to detect it in SW at rule-insertion time.
The 'miss' for all or nothing is easy, but the hard part is combining
all the paths a packet can take in software to a single 'all or nothing'
rule in hardware.
And when you do that, any change to any part of the path requires
updating all the resulting rules. In the above case an update to the
route of 8.8.8.8 affects the 1000 rules before it.
I can see two approaches to do the all of nothing approach, one is to
statically analyze the software policy (tc rules in this case), and
carefully offload
all possible paths. For that, you (who? the driver? tc? ovs?) would need
to construct a graph, which even if you could do, there isn't any
infrastructure for it, it would only valid while the configuration stays
the same,
as any change to an edge on the graph requires re-analyzing it and
possible changing tens of thousands of rules. In case of connection
tracking, maybe hundred thousands...
Or the second approach, to save the expensive analysis, and unused
hardware rules in the above, trace the path of processing the packet
(gather and commit), save it somewhere on the SKB or per CPU, and
creating new rules - basically a hardware cache.
But again, any change to the policy, and large chunks of this cache
would be invalid.
Regarding the use-case, For OvS, as far as I know, it uses
re-circulation for ct action, and dp hash actions, since it needs kernel
help.
This extension is first introduced for ovs->tc offloads of connection
tracking, but as we said, we are going to re-use it for tc->hardware in
the same manner.
And that can take on many use-cases. The above is a valid use-case if
the controller isn't OvS.
The controller learn rules and insert the first 1000 rules matching on
some 5-tuple in chain 0, but then when you want to encap the packets,
you jump to the encap chain.
This allows you to update the encap action for all rules, and segment
the logic for your policy.
tc chains are always used this way for a lot of cases which is
segmenting policy logic.
> FWIW I think the only way this infra will ever be used is: chain 0 rules match <blah> action ct zone <n> goto chain <m>; chain <m> rules match ct_state +xyz <blah> action <actually do stuff to packet>. This can be supported with a hardware pipeline with three tables in series in a fairly natural way — and since all the actions that modify the packet (rather than just directing subsequent matching) are at the end, the natural miss behaviour is deliver-unmodified.
>
> I tried to explain this back on your [v3] net: openvswitch: Set OvS recirc_id from tc chain index, but you seemed set on your approach so I didn't persist. Now it looks like maybe I should have...
What you describe with the three tables (chain 0, chain X, and ct table
for each ct zone), follows the software model and requires continuing
from some miss point
and is what we plan on doing.
We offload something like:
chain 0 dst_mac aa:bb:cc:dd:ee:ff ct_state -trk action ct (goto special
CT table), action goto chain X
chain X ct_state +trk+est action forward to port
And a table that replicates the ct zone table, which has something like
<match on tuple 1.... > set metadata ct_state = +est, action continue
<match on tuple 2.... > set metadata ct_state = +est, action continue
...
Lots tuples...
What if you 'miss' on the match for the tuple? You already did some
processing in hardware, so either you revert those, or you continue in
software where you left off (the action ct).
We want to preserve the current software model, just as tc can do part
of the processing and it will continue to the rest of the pipepline, be
it OvS, bridge, or loopback. And in hardware we would do the same.
The all or nothing approach will require changing the software model to
allow
merging the ct zone table matches into the hardware rules and offload
something like:
dst_mac aa:bb:cc:dd:ee:ff < match on tuple 1> fwd port
dst_mac aa:bb:cc:dd:ee:ff < match on tuple 2> fwd port
...
Lots of 'merged' rules.
You delete the rule with action ct, and you have to delete all this
merged, instead of just one rule.
Tracing the packet, or merging rules will require new infra to support this.
Powered by blists - more mailing lists