[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <da3d2c8d-7e20-c50f-b8ea-4ec7183721ee@mellanox.com>
Date: Sat, 8 Jun 2019 11:19:23 +0000
From: Eli Britstein <elibr@...lanox.com>
To: Davide Caratti <dcaratti@...hat.com>,
Cong Wang <xiyou.wangcong@...il.com>
CC: Eric Dumazet <eric.dumazet@...il.com>,
Jiri Pirko <jiri@...nulli.us>,
Jamal Hadi Salim <jhs@...atatu.com>,
"David S . Miller" <davem@...emloft.net>,
Linux Kernel Network Developers <netdev@...r.kernel.org>,
Shuang Li <shuali@...hat.com>,
Stephen Hemminger <stephen@...workplumber.org>
Subject: Re: [PATCH net v3 0/3] net/sched: fix actions reading the network
header in case of QinQ packets
On 6/7/2019 9:20 PM, Davide Caratti wrote:
> On Wed, 2019-06-05 at 18:42 -0700, Cong Wang wrote:
>> On Tue, Jun 4, 2019 at 11:19 AM Eli Britstein <elibr@...lanox.com> wrote:
> hello Cong and Eli,
>
> and thanks for all the thoughts.
>
>>> On 6/4/2019 8:55 PM, Cong Wang wrote:
>>>> On Sat, Jun 1, 2019 at 9:22 PM Eli Britstein <elibr@...lanox.com> wrote:
>>>>> I think that's because QinQ, or VLAN is not an encapsulation. There is
>>>>> no outer/inner packets, and if you want to mangle fields in the packet
>>>>> you can do it and the result is well-defined.
>>>> Sort of, perhaps VLAN tags are too short to be called as an
>>>> encapsulation, my point is that it still needs some endpoints to push
>>>> or pop the tags, in a similar way we do encap/decap.
>>>>
>>>>
>>>>> BTW, the motivation for my fix was a use case were 2 VGT VMs
>>>>> communicating by OVS failed. Since OVS sees the same VLAN tag, it
>>>>> doesn't add explicit VLAN pop/push actions (i.e pop, mangle, push). If
>>>>> you force explicit pop/mangle/push you will break such applications.
>>>> From what you said, it seems act_csum is in the middle of packet
>>>> receive/transmit path. So, which is the one pops the VLAN tags in
>>>> this scenario? If the VM's are the endpoints, why not use act_csum
>>>> there?
>>> In a switchdev mode, we can passthru the VFs to VMs, and have their
>>> representors in the host, enabling us to manipulate the HW eswitch
>>> without knowledge of the VMs.
>>>
>>> To simplify it, consider the following setup:
>>>
>>> v1a <-> v1b and v2a <-> v2b are veth pairs.
>>>
>>> Now, we configure v1a.20 and v2a.20 as VLAN devices over v1a/v2a
>>> respectively (and put the "a" devs in separate namespaces).
>>>
>>> The TC rules are on the "b" devs, for example:
>>>
>>> tc filter add dev v1b ... action pedit ... action csum ... action
>>> redirect dev v2b
>>>
>>> Now, ping from v1a.20 to v1b.20. The namespaces transmit/receive tagged
>>> packets, and are not aware of the packet manipulation (and the required
>>> act_csum).
>> This is what I said, v1b is not the endpoint which pops the vlan tag,
>> v1b.20 is. So, why not simply move at least the csum action to
>> v1b.20? With that, you can still filter and redirect packets on v1b,
>> you still even modify it too, just defer the checksum fixup to the
>> endpoint.
>>
>> And to be fair, if this case is a valid concern, so is VXLAN case,
>> just replace v1a.20 and v2a.20 with VXLAN tunnels. If you modify
>> the inner header, you have to fixup the checksum in the outer
>> UDP header.
> at least with single "accelerated" vlan tags, I think that users of
> tc_skb_protocol() that explicitly access the IP header should be fixed the
> same way that Eli did to 'csum'.
>
> (note that 'pedit' is *not* among them, and that's why Eli only needed to
> fix 'csum' in his setup :) )
>
> Now I'm no more sure about what to do with the QinQ case, whether we
> should:
>
> a) fix missing skb_may_pull() in csum, and fix (a couple of) other actions
> in the same way,
I'm for that.
>
> or
>
> b) rework act_csum when it wants to read the IP header, in a way that only
> ip header is mangled only when there are only zero or a single
> "accelerated" tag, and do the same for (a couple of) other actions.
What is the conceptually difference between single VLAN and QinQ? Or,
even between "accelerated" to non-accelerated?
Furthermore, doing that you would reduce functionality already exists.
Don't do it.
>
> The pop/push approach built in the action ( option a) ) fixes my
> environment - but like Cong says, it only would work with VLANs, and not
> with other encapsulations. Probably it really makes sense to see if it's
> possible to extend act_vlan in a way that it reconstructs the packet after
> an iteration of 'vlan pop'. This might not be easy, because even the above
> command assumes that inner and outer tag have the same ethertype (which is
> not generally true for QinQ).
Again, I think VLANs and encapsulations are not the same. Actions for
tagged packets are well defined, the same as they are on native packets.
In encapsulations you would *HAVE* to do make decap/encap in order to
act on inner packet.
>
> But until then, we should assume that read/writes of the IP header are not
> feasible for QinQ traffic, just like it's not possible for tunneled
> packets, and adjust configuration accordingly.
>
> WDYT?
>
Powered by blists - more mailing lists