[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAM_iQpUmuHH8S35ERuJ-sFS=17aa-C8uHSWF-WF7toANX2edCQ@mail.gmail.com>
Date: Fri, 31 May 2019 15:29:46 -0700
From: Cong Wang <xiyou.wangcong@...il.com>
To: Davide Caratti <dcaratti@...hat.com>
Cc: Eric Dumazet <eric.dumazet@...il.com>,
Jiri Pirko <jiri@...nulli.us>,
Jamal Hadi Salim <jhs@...atatu.com>,
"David S . Miller" <davem@...emloft.net>,
Linux Kernel Network Developers <netdev@...r.kernel.org>,
Shuang Li <shuali@...hat.com>,
Eli Britstein <elibr@...lanox.com>,
Stephen Hemminger <stephen@...workplumber.org>
Subject: Re: [PATCH net v3 0/3] net/sched: fix actions reading the network
header in case of QinQ packets
On Fri, May 31, 2019 at 3:01 PM Davide Caratti <dcaratti@...hat.com> wrote:
>
> On Fri, 2019-05-31 at 11:42 -0700, Cong Wang wrote:
> > On Fri, May 31, 2019 at 10:26 AM Davide Caratti <dcaratti@...hat.com> wrote:
> > > 'act_csum' was recently fixed to mangle the IPv4/IPv6 header if a packet
> > > having one or more VLAN headers was processed: patch #1 ensures that all
> > > VLAN headers are in the linear area of the skb.
> > > Other actions might read or mangle the IPv4/IPv6 header: patch #2 and #3
> > > fix 'act_pedit' and 'act_skbedit' respectively.
> >
> > Maybe, just maybe, vlan tags are supposed to be handled by act_vlan?
> > Which means maybe users have to pipe act_vlan to these actions.
>
> but it's not possible with the current act_vlan code.
> Each 'vlan' action pushes or pops a single tag, so:
>
> 1) we don't know how many vlan tags there are in each packet, so I should
> put an (enough) high number of "pop" operations to ensure that a 'pedit'
> rule correctly mangles the TTL in a IPv4 packet having 1 or more 802.1Q
> tags in the L2 header.
Not true, we do know whether the last vlan tag is pop'ed by checking
the protocol. There was already a use case in netdev before:
tc filter add dev veth1 egress prio 100 protocol 802.1Q matchall
action vlan pop continue #reclassify
tc filter add dev veth1 egress prio 200 protocol ip u32 match ip
src 192.168.1.0/24 action drop
tc filter add dev veth1 egress prio 201 protocol ip u32 match ip
dst 192.168.100.0/24 action drop
which is from a bug report.
>
> 2) after a vlan is popped with act_vlan, the kernel forgets about the VLAN
> ID and the VLAN type. So, if I want to just mangle the TTL in a QinQ
> packet, I need to reinject it in a place where both tags (including VLAN
> type *and* VLAN id) are restored in the packet.
It is forgotten by act_vlan only, the vlan info is still inside the
packet header.
Perhaps we just need some action to push it back.
>
> Clearly, act_vlan can't be used as is, because 'push' has hardcoded VLAN
> ID and ethertype. Unless we change act_vlan code to enable rollback of
> previous 'pop' operations, it's quite hard to pipe the correct sequence of
> vlan 'pop' and 'push'.
What about other encapsulations like VXLAN? What if I just want to
mangle the inner TTL of a VXLAN packet? You know the answer is setting
up TC filters and actions on VXLAN device instead of ethernet device.
IOW, why QinQ is so special that we have to take care of inside TC action
not the encapsulation endpoint?
>
> > From the code reuse perspective, you are adding TCA_VLAN_ACT_POP
> > to each of them.
>
> No, these patches don't pop VLAN tags. All tags are restored after the
> action completed his work, before returning a->tcfa_action.
>
> May I ask you to read it as a followup of commit 2ecba2d1e45b ("net:
> sched: act_csum: Fix csum calc for tagged packets"), where the 'csum'
> action was modified to mangle the checksum of IPv4 headers even when
> multiple 802.1Q tags were present?
Yes, I already read it and I think that commit should be reverted for the
same reason as I already stated above.
> With this series it becomes possible to mangle also the TTL field (with
> pedit), and assign the diffserv bits to skb->priority (with skbedit).
Sorry, I am not yet convinced why we should do it in TC.
Thanks.
Powered by blists - more mailing lists