[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ecfb7918-7660-91f0-035e-56f58a41dc17@mellanox.com>
Date: Thu, 26 Sep 2019 13:56:24 +0000
From: Paul Blakey <paulb@...lanox.com>
To: Edward Cree <ecree@...arflare.com>,
Jakub Kicinski <jakub.kicinski@...ronome.com>
CC: Pravin Shelar <pshelar@....org>,
Daniel Borkmann <daniel@...earbox.net>,
Vlad Buslov <vladbu@...lanox.com>,
David Miller <davem@...emloft.net>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
Jiri Pirko <jiri@...nulli.us>,
Cong Wang <xiyou.wangcong@...il.com>,
Jamal Hadi Salim <jhs@...atatu.com>,
Simon Horman <simon.horman@...ronome.com>,
Or Gerlitz <gerlitz.or@...il.com>
Subject: Re: CONFIG_NET_TC_SKB_EXT
On 9/26/2019 4:09 PM, Edward Cree wrote:
> On 26/09/2019 08:30, Paul Blakey wrote:
>> Ok, I thought you meant merging the rules because we do want to support
>> those modifications use-cases.
> I think the point is that your use-case is sufficiently weird and
> obscure that code in the core to support it needs to be unintrusive;
> and this clearly wasn't (you managed to piss off Linus...) so it
> should be reverted, and held off until a more palatable solution can
> be produced. I agree with Alexei on this.
> Neither currently-supported-by-drivers cases nor the first step that's
> likely to be added (the simple conntrack with modifications only at
> the end) needs this, it's for capabilities that are farther in the
> future, so there's really no need for it to be in the tree when it's
> not ready, which appears to be the case at present.
The use-case isn't weird, as I've just shown, even nat needs that. See
below that even OvS has selftests that test that.
That's normal OvS nat, which you can see in many use-cases. You can take
a look at different OvS controllers
and you would see the same behavior.
This is the first step that was required to support offloading
connection tracking from OvS.
We are in progress of submitting the userspace patches, and would have
done it today if it weren't'
for this CONFIG discussion. It was already submitted as RFC already.
Our driver implementation is ready and we will submit it once userspace
is accepted.
>> In nat scenarios the packet will be modified, and then there can be a miss:
>>
>> -trk .... CT(zone X, Restore NAT),goto chain 1
>>
>> +trk+est, match on ipv4, CT(zone Y), goto chain 2
>>
>> +trk+est, output..
> I'm confused, I thought the usual nat scenario looked more like
> 0: -trk ... action ct(zone x), goto chain 1
> 1: +trk+new ... action ct(commit, nat=foo) # sw only
> 1: +trk+est ... action ct(nat), mirred eth1
> i.e. the NAT only happens after conntrack has matched (and thus provided
> the saved NAT metadata), at the end of the pipe. I don't see how you
> can NAT a -trk packet.
Both are valid, Nat in the first hop, executes the nat stored on the
connection if available (configured by commit).
You can see it in multiple uses in ovs tests: git grep -pni
"table=0.*ct.*nat.*" tests/system-traffic.at
Even after you restore nat, you can jump to different zones, or even
again to another table....
I've seen much deeper re-circulation chains (which usually means CT())
action.
See ovs self-tests as reference.
>
>> Also, there are stats issues if we already accounted for some actions in
>> hardware.
> AFAICT only 'deliverish' actions (i.e. mirred and drop) in TC have stats.
> So stats are unlikely to be a problem unless you've got (say) a mirred
> mirror before you send to ct and goto chain, in which case the extra
> copy of the packet is a rather bigger problem for idempotency than mere
> stats ;-)
All tc actions have software stats, and at least one (goto, mirred,
drop) per OvS generated rule will have hardware stats.
All OvS datapath rules have stats, and in turn the translated TC rules
all have stats. OvS ages each rule independently.
Powered by blists - more mailing lists