netdev - Re: [PATCH net v3 0/3] net/sched: fix actions reading the network header in case of QinQ packets

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <de51a01e-f0bd-cc18-bba4-93c6e07bb910@mellanox.com>
Date:   Tue, 11 Jun 2019 04:43:02 +0000
From:   Eli Britstein <elibr@...lanox.com>
To:     Cong Wang <xiyou.wangcong@...il.com>
CC:     Davide Caratti <dcaratti@...hat.com>,
        Eric Dumazet <eric.dumazet@...il.com>,
        Jiri Pirko <jiri@...nulli.us>,
        Jamal Hadi Salim <jhs@...atatu.com>,
        "David S . Miller" <davem@...emloft.net>,
        Linux Kernel Network Developers <netdev@...r.kernel.org>,
        Shuang Li <shuali@...hat.com>,
        Stephen Hemminger <stephen@...workplumber.org>
Subject: Re: [PATCH net v3 0/3] net/sched: fix actions reading the network
 header in case of QinQ packets


On 6/11/2019 3:52 AM, Cong Wang wrote:
> On Wed, Jun 5, 2019 at 10:37 PM Eli Britstein <elibr@...lanox.com> wrote:
>>
>> On 6/6/2019 4:42 AM, Cong Wang wrote:
>>> On Tue, Jun 4, 2019 at 11:19 AM Eli Britstein <elibr@...lanox.com> wrote:
>>>> On 6/4/2019 8:55 PM, Cong Wang wrote:
>>>>> On Sat, Jun 1, 2019 at 9:22 PM Eli Britstein <elibr@...lanox.com> wrote:
>>>>>> I think that's because QinQ, or VLAN is not an encapsulation. There is
>>>>>> no outer/inner packets, and if you want to mangle fields in the packet
>>>>>> you can do it and the result is well-defined.
>>>>> Sort of, perhaps VLAN tags are too short to be called as an
>>>>> encapsulation, my point is that it still needs some endpoints to push
>>>>> or pop the tags, in a similar way we do encap/decap.
>>>>>
>>>>>
>>>>>> BTW, the motivation for my fix was a use case were 2 VGT VMs
>>>>>> communicating by OVS failed. Since OVS sees the same VLAN tag, it
>>>>>> doesn't add explicit VLAN pop/push actions (i.e pop, mangle, push). If
>>>>>> you force explicit pop/mangle/push you will break such applications.
>>>>>    From what you said, it seems act_csum is in the middle of packet
>>>>> receive/transmit path. So, which is the one pops the VLAN tags in
>>>>> this scenario? If the VM's are the endpoints, why not use act_csum
>>>>> there?
>>>> In a switchdev mode, we can passthru the VFs to VMs, and have their
>>>> representors in the host, enabling us to manipulate the HW eswitch
>>>> without knowledge of the VMs.
>>>>
>>>> To simplify it, consider the following setup:
>>>>
>>>> v1a <-> v1b and v2a <-> v2b are veth pairs.
>>>>
>>>> Now, we configure v1a.20 and v2a.20 as VLAN devices over v1a/v2a
>>>> respectively (and put the "a" devs in separate namespaces).
>>>>
>>>> The TC rules are on the "b" devs, for example:
>>>>
>>>> tc filter add dev v1b ... action pedit ... action csum ... action
>>>> redirect dev v2b
>>>>
>>>> Now, ping from v1a.20 to v1b.20. The namespaces transmit/receive tagged
>>>> packets, and are not aware of the packet manipulation (and the required
>>>> act_csum).
>>> This is what I said, v1b is not the endpoint which pops the vlan tag,
>>> v1b.20 is. So, why not simply move at least the csum action to
>>> v1b.20? With that, you can still filter and redirect packets on v1b,
>>> you still even modify it too, just defer the checksum fixup to the
>>> endpoint.
>> There are no vxb.20 ports:
>>
>> ns0:     v1a.20 ----(VLAN)---- v1a ns1:    v2a ---- (VLAN) ---- v2a.20
>>
>> |----(veth)---- v1b     <---- (TC) ---->    v2b ----(veth)----|
>
> This diagram makes me even more confusing...
>
> Can you explicitly explain why there is no vxb.20? Is it a router or
> something?
Yes.
>
> By the way, even if it is router and you really want to checksum the
> packet at that point, you still don't have to move the skb->data
> pointer, you just need to parse the header and calculate the offset
> without touching skb->data. This could at least avoid restoring
> skb->data after it.
Sure, this is another implementation method. It doesn't change the 
essence. I just wanted to reuse the existing tcf_csum_ipv4/6.
>
> Thanks.