[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAM_iQpX=KqnYP6O139WxH-ouF=vM2=42HS4WLK9PK0E76J-GGw@mail.gmail.com>
Date: Mon, 10 Jun 2019 17:52:08 -0700
From: Cong Wang <xiyou.wangcong@...il.com>
To: Eli Britstein <elibr@...lanox.com>
Cc: Davide Caratti <dcaratti@...hat.com>,
Eric Dumazet <eric.dumazet@...il.com>,
Jiri Pirko <jiri@...nulli.us>,
Jamal Hadi Salim <jhs@...atatu.com>,
"David S . Miller" <davem@...emloft.net>,
Linux Kernel Network Developers <netdev@...r.kernel.org>,
Shuang Li <shuali@...hat.com>,
Stephen Hemminger <stephen@...workplumber.org>
Subject: Re: [PATCH net v3 0/3] net/sched: fix actions reading the network
header in case of QinQ packets
On Wed, Jun 5, 2019 at 10:37 PM Eli Britstein <elibr@...lanox.com> wrote:
>
>
> On 6/6/2019 4:42 AM, Cong Wang wrote:
> > On Tue, Jun 4, 2019 at 11:19 AM Eli Britstein <elibr@...lanox.com> wrote:
> >>
> >> On 6/4/2019 8:55 PM, Cong Wang wrote:
> >>> On Sat, Jun 1, 2019 at 9:22 PM Eli Britstein <elibr@...lanox.com> wrote:
> >>>> I think that's because QinQ, or VLAN is not an encapsulation. There is
> >>>> no outer/inner packets, and if you want to mangle fields in the packet
> >>>> you can do it and the result is well-defined.
> >>> Sort of, perhaps VLAN tags are too short to be called as an
> >>> encapsulation, my point is that it still needs some endpoints to push
> >>> or pop the tags, in a similar way we do encap/decap.
> >>>
> >>>
> >>>> BTW, the motivation for my fix was a use case were 2 VGT VMs
> >>>> communicating by OVS failed. Since OVS sees the same VLAN tag, it
> >>>> doesn't add explicit VLAN pop/push actions (i.e pop, mangle, push). If
> >>>> you force explicit pop/mangle/push you will break such applications.
> >>> From what you said, it seems act_csum is in the middle of packet
> >>> receive/transmit path. So, which is the one pops the VLAN tags in
> >>> this scenario? If the VM's are the endpoints, why not use act_csum
> >>> there?
> >> In a switchdev mode, we can passthru the VFs to VMs, and have their
> >> representors in the host, enabling us to manipulate the HW eswitch
> >> without knowledge of the VMs.
> >>
> >> To simplify it, consider the following setup:
> >>
> >> v1a <-> v1b and v2a <-> v2b are veth pairs.
> >>
> >> Now, we configure v1a.20 and v2a.20 as VLAN devices over v1a/v2a
> >> respectively (and put the "a" devs in separate namespaces).
> >>
> >> The TC rules are on the "b" devs, for example:
> >>
> >> tc filter add dev v1b ... action pedit ... action csum ... action
> >> redirect dev v2b
> >>
> >> Now, ping from v1a.20 to v1b.20. The namespaces transmit/receive tagged
> >> packets, and are not aware of the packet manipulation (and the required
> >> act_csum).
> > This is what I said, v1b is not the endpoint which pops the vlan tag,
> > v1b.20 is. So, why not simply move at least the csum action to
> > v1b.20? With that, you can still filter and redirect packets on v1b,
> > you still even modify it too, just defer the checksum fixup to the
> > endpoint.
>
> There are no vxb.20 ports:
>
> ns0: v1a.20 ----(VLAN)---- v1a ns1: v2a ---- (VLAN) ---- v2a.20
>
> |----(veth)---- v1b <---- (TC) ----> v2b ----(veth)----|
This diagram makes me even more confusing...
Can you explicitly explain why there is no vxb.20? Is it a router or
something?
By the way, even if it is router and you really want to checksum the
packet at that point, you still don't have to move the skb->data
pointer, you just need to parse the header and calculate the offset
without touching skb->data. This could at least avoid restoring
skb->data after it.
Thanks.
Powered by blists - more mailing lists