[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <82ec3877-8026-67f7-90d8-6e9988513fef@mellanox.com>
Date: Sun, 2 Jun 2019 04:22:40 +0000
From: Eli Britstein <elibr@...lanox.com>
To: Cong Wang <xiyou.wangcong@...il.com>,
Davide Caratti <dcaratti@...hat.com>
CC: Eric Dumazet <eric.dumazet@...il.com>,
Jiri Pirko <jiri@...nulli.us>,
Jamal Hadi Salim <jhs@...atatu.com>,
"David S . Miller" <davem@...emloft.net>,
Linux Kernel Network Developers <netdev@...r.kernel.org>,
Shuang Li <shuali@...hat.com>,
Stephen Hemminger <stephen@...workplumber.org>
Subject: Re: [PATCH net v3 0/3] net/sched: fix actions reading the network
header in case of QinQ packets
On 6/1/2019 1:29 AM, Cong Wang wrote:
> On Fri, May 31, 2019 at 3:01 PM Davide Caratti <dcaratti@...hat.com> wrote:
>> On Fri, 2019-05-31 at 11:42 -0700, Cong Wang wrote:
>>> On Fri, May 31, 2019 at 10:26 AM Davide Caratti <dcaratti@...hat.com> wrote:
>>>> 'act_csum' was recently fixed to mangle the IPv4/IPv6 header if a packet
>>>> having one or more VLAN headers was processed: patch #1 ensures that all
>>>> VLAN headers are in the linear area of the skb.
>>>> Other actions might read or mangle the IPv4/IPv6 header: patch #2 and #3
>>>> fix 'act_pedit' and 'act_skbedit' respectively.
>>> Maybe, just maybe, vlan tags are supposed to be handled by act_vlan?
>>> Which means maybe users have to pipe act_vlan to these actions.
>> but it's not possible with the current act_vlan code.
>> Each 'vlan' action pushes or pops a single tag, so:
>>
>> 1) we don't know how many vlan tags there are in each packet, so I should
>> put an (enough) high number of "pop" operations to ensure that a 'pedit'
>> rule correctly mangles the TTL in a IPv4 packet having 1 or more 802.1Q
>> tags in the L2 header.
> Not true, we do know whether the last vlan tag is pop'ed by checking
> the protocol. There was already a use case in netdev before:
>
> tc filter add dev veth1 egress prio 100 protocol 802.1Q matchall
> action vlan pop continue #reclassify
> tc filter add dev veth1 egress prio 200 protocol ip u32 match ip
> src 192.168.1.0/24 action drop
> tc filter add dev veth1 egress prio 201 protocol ip u32 match ip
> dst 192.168.100.0/24 action drop
>
> which is from a bug report.
>
>> 2) after a vlan is popped with act_vlan, the kernel forgets about the VLAN
>> ID and the VLAN type. So, if I want to just mangle the TTL in a QinQ
>> packet, I need to reinject it in a place where both tags (including VLAN
>> type *and* VLAN id) are restored in the packet.
> It is forgotten by act_vlan only, the vlan info is still inside the
> packet header.
> Perhaps we just need some action to push it back.
There is memmove in those functions, so the VLAN is overwritten, and you
will also need another memory to store the VLANs.
>
>> Clearly, act_vlan can't be used as is, because 'push' has hardcoded VLAN
>> ID and ethertype. Unless we change act_vlan code to enable rollback of
>> previous 'pop' operations, it's quite hard to pipe the correct sequence of
>> vlan 'pop' and 'push'.
> What about other encapsulations like VXLAN? What if I just want to
> mangle the inner TTL of a VXLAN packet? You know the answer is setting
> up TC filters and actions on VXLAN device instead of ethernet device.
>
> IOW, why QinQ is so special that we have to take care of inside TC action
> not the encapsulation endpoint?
I think that's because QinQ, or VLAN is not an encapsulation. There is
no outer/inner packets, and if you want to mangle fields in the packet
you can do it and the result is well-defined.
BTW, the motivation for my fix was a use case were 2 VGT VMs
communicating by OVS failed. Since OVS sees the same VLAN tag, it
doesn't add explicit VLAN pop/push actions (i.e pop, mangle, push). If
you force explicit pop/mangle/push you will break such applications.
>
>
>>> From the code reuse perspective, you are adding TCA_VLAN_ACT_POP
>>> to each of them.
>> No, these patches don't pop VLAN tags. All tags are restored after the
>> action completed his work, before returning a->tcfa_action.
>>
>> May I ask you to read it as a followup of commit 2ecba2d1e45b ("net:
>> sched: act_csum: Fix csum calc for tagged packets"), where the 'csum'
>> action was modified to mangle the checksum of IPv4 headers even when
>> multiple 802.1Q tags were present?
> Yes, I already read it and I think that commit should be reverted for the
> same reason as I already stated above.
>
>
>> With this series it becomes possible to mangle also the TTL field (with
>> pedit), and assign the diffserv bits to skb->priority (with skbedit).
> Sorry, I am not yet convinced why we should do it in TC.
>
> Thanks.
Powered by blists - more mailing lists