netdev - Re: CONFIG_NET_TC_SKB

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5ae6ca73-3683-b907-85fd-3d09bed9b68e@mellanox.com>
Date:   Thu, 26 Sep 2019 15:14:32 +0000
From:   Paul Blakey <paulb@...lanox.com>
To:     Edward Cree <ecree@...arflare.com>,
        Jakub Kicinski <jakub.kicinski@...ronome.com>
CC:     Pravin Shelar <pshelar@....org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Vlad Buslov <vladbu@...lanox.com>,
        David Miller <davem@...emloft.net>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Jiri Pirko <jiri@...nulli.us>,
        Cong Wang <xiyou.wangcong@...il.com>,
        Jamal Hadi Salim <jhs@...atatu.com>,
        Simon Horman <simon.horman@...ronome.com>,
        Or Gerlitz <gerlitz.or@...il.com>
Subject: Re: CONFIG_NET_TC_SKB_EXT


On 9/26/2019 5:26 PM, Edward Cree wrote:
> On 26/09/2019 14:56, Paul Blakey wrote:
>>>> In nat scenarios the packet will be modified, and then there can be a miss:
>>>>
>>>>               -trk .... CT(zone X, Restore NAT),goto chain 1
>>>>
>>>>               +trk+est, match on ipv4, CT(zone Y), goto chain 2
>>>>
>>>>               +trk+est, output..
>>> I'm confused, I thought the usual nat scenario looked more like
>>>       0: -trk ... action ct(zone x), goto chain 1
>>>       1: +trk+new ... action ct(commit, nat=foo) # sw only
>>>       1: +trk+est ... action ct(nat), mirred eth1
>>> i.e. the NAT only happens after conntrack has matched (and thus provided
>>>    the saved NAT metadata), at the end of the pipe.  I don't see how you
>>>    can NAT a -trk packet.
>> Both are valid, Nat in the first hop, executes the nat stored on the
>> connection if available (configured by commit).
> This still isn't making sense to me.
> Until you've done a conntrack lookup and found the connection, you can't
>   use NAT information that's stored in the connection.
> So the NAT can only happen after a conntrack match is found.

That's how it works, CT action restores the metadata (nf_conn on the 
SKB), you can then, if needed,  execute the nat,

That is stored implicitly on this metadata (by using the reverse of the 
reply tuple). It's why nf_conn status has two status bits 
(IPS_SRC_NAT_DONE_BIT  and IPS_SRC_NAT).

After you execute it, IPS_SRC_NAT_DONE_BIT  is set, instead of just 
IPS_SRC_NAT (nat needed bit).

The usage is straight from ovs testsuite, please see it in ovs git.

>
> And all the rest of your stuff (like doing conntrack twice, in different
>   zones X and Y) is 'weird' inasmuch as it's beyond the basic minimum
>   functionality for a useful offload, and inherently doesn't map to a
>   fixed-layout (non-loopy) HW pipeline.  You may want to support it in
>   your driver, you may be able to support it in your hardware, but it's
>   not true that "even nat needs that" (the nat scenario I described above
>   is entirely reasonable and is perfectly workable in an all-or-nothing
>   offload world), so if your changes are causing problems, they should be
>   reverted for this cycle.

The change didn't cause any problem, and this doesn't contradict any 
other vendor, doing what you suggest - offloading just a subset of the 
rules.

As we are discussing the config default, I've sent a patch to set the 
config to N.