[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5ae6ca73-3683-b907-85fd-3d09bed9b68e@mellanox.com>
Date: Thu, 26 Sep 2019 15:14:32 +0000
From: Paul Blakey <paulb@...lanox.com>
To: Edward Cree <ecree@...arflare.com>,
Jakub Kicinski <jakub.kicinski@...ronome.com>
CC: Pravin Shelar <pshelar@....org>,
Daniel Borkmann <daniel@...earbox.net>,
Vlad Buslov <vladbu@...lanox.com>,
David Miller <davem@...emloft.net>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
Jiri Pirko <jiri@...nulli.us>,
Cong Wang <xiyou.wangcong@...il.com>,
Jamal Hadi Salim <jhs@...atatu.com>,
Simon Horman <simon.horman@...ronome.com>,
Or Gerlitz <gerlitz.or@...il.com>
Subject: Re: CONFIG_NET_TC_SKB_EXT
On 9/26/2019 5:26 PM, Edward Cree wrote:
> On 26/09/2019 14:56, Paul Blakey wrote:
>>>> In nat scenarios the packet will be modified, and then there can be a miss:
>>>>
>>>> -trk .... CT(zone X, Restore NAT),goto chain 1
>>>>
>>>> +trk+est, match on ipv4, CT(zone Y), goto chain 2
>>>>
>>>> +trk+est, output..
>>> I'm confused, I thought the usual nat scenario looked more like
>>> 0: -trk ... action ct(zone x), goto chain 1
>>> 1: +trk+new ... action ct(commit, nat=foo) # sw only
>>> 1: +trk+est ... action ct(nat), mirred eth1
>>> i.e. the NAT only happens after conntrack has matched (and thus provided
>>> the saved NAT metadata), at the end of the pipe. I don't see how you
>>> can NAT a -trk packet.
>> Both are valid, Nat in the first hop, executes the nat stored on the
>> connection if available (configured by commit).
> This still isn't making sense to me.
> Until you've done a conntrack lookup and found the connection, you can't
> use NAT information that's stored in the connection.
> So the NAT can only happen after a conntrack match is found.
That's how it works, CT action restores the metadata (nf_conn on the
SKB), you can then, if needed, execute the nat,
That is stored implicitly on this metadata (by using the reverse of the
reply tuple). It's why nf_conn status has two status bits
(IPS_SRC_NAT_DONE_BIT and IPS_SRC_NAT).
After you execute it, IPS_SRC_NAT_DONE_BIT is set, instead of just
IPS_SRC_NAT (nat needed bit).
The usage is straight from ovs testsuite, please see it in ovs git.
>
> And all the rest of your stuff (like doing conntrack twice, in different
> zones X and Y) is 'weird' inasmuch as it's beyond the basic minimum
> functionality for a useful offload, and inherently doesn't map to a
> fixed-layout (non-loopy) HW pipeline. You may want to support it in
> your driver, you may be able to support it in your hardware, but it's
> not true that "even nat needs that" (the nat scenario I described above
> is entirely reasonable and is perfectly workable in an all-or-nothing
> offload world), so if your changes are causing problems, they should be
> reverted for this cycle.
The change didn't cause any problem, and this doesn't contradict any
other vendor, doing what you suggest - offloading just a subset of the
rules.
As we are discussing the config default, I've sent a patch to set the
config to N.
Powered by blists - more mailing lists