[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAOuuhY9+wmNtEafK1TzcbFF3eqJda_EcGNj_xu+B4Wp7rqo5xQ@mail.gmail.com>
Date: Sat, 4 Sep 2021 07:08:56 -0700
From: Tom Herbert <tom@...anda.io>
To: Jamal Hadi Salim <jhs@...atatu.com>
Cc: Boris Sukholitko <boris.sukholitko@...adcom.com>,
netdev@...r.kernel.org, Jiri Pirko <jiri@...nulli.us>,
Cong Wang <xiyou.wangcong@...il.com>,
"David S . Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
Vladimir Oltean <olteanv@...il.com>,
Vadym Kochan <vadym.kochan@...ision.eu>,
Ilya Lifshits <ilya.lifshits@...adcom.com>,
Felipe Magno de Almeida <felipe@...ertise.dev>,
Pedro Tammela <pctammela@...atatu.com>
Subject: Re: [PATCH net-next] net/sched: cls_flower: Add orig_ethtype
On Tue, Aug 31, 2021 at 6:18 AM Jamal Hadi Salim <jhs@...atatu.com> wrote:
>
> On 2021-08-31 8:04 a.m., Boris Sukholitko wrote:
> > Hi Jamal,
> >
> > On Mon, Aug 30, 2021 at 09:48:38PM -0400, Jamal Hadi Salim wrote:
> >> On 2021-08-30 4:08 a.m., Boris Sukholitko wrote:
> >>> The following flower filter fails to match packets:
> >>>
> >>> tc filter add dev eth0 ingress protocol 0x8864 flower \
> >>> action simple sdata hi64
> >>>
> >>> The protocol 0x8864 (ETH_P_PPP_SES) is a tunnel protocol. As such, it is
> >>> being dissected by __skb_flow_dissect and it's internal protocol is
> >>> being set as key->basic.n_proto. IOW, the existence of ETH_P_PPP_SES
> >>> tunnel is transparent to the callers of __skb_flow_dissect.
> >>>
> >>> OTOH, in the filters above, cls_flower configures its key->basic.n_proto
> >>> to the ETH_P_PPP_SES value configured by the user. Matching on this key
> >>> fails because of __skb_flow_dissect "transparency" mentioned above.
> >>>
> >>> Therefore there is no way currently to match on such packets using
> >>> flower.
> >>>
> >>> To fix the issue add new orig_ethtype key to the flower along with the
> >>> necessary changes to the flow dissector etc.
> >>>
> >>> To filter the ETH_P_PPP_SES packets the command becomes:
> >>>
> >>> tc filter add dev eth0 ingress flower orig_ethtype 0x8864 \
> >>> action simple sdata hi64
> >>
> >> Where's "protocol" on the above command line is. Probably a typo?
> >
> > There is no need for protocol there. We intend to match on the tunnel
> > protocol existence only, disregarding its contents. Therefore
> > orig_ethtype key is sufficient.
> >
>
> Hold on: The command line requires you specify the root (parse)
> node with keyword "protocol". i.e something like:
> tc filter add dev eth0 ingress protocol 0x8864 flower ...
>
> You are suggesting a new syntax - which only serves this one
> use case that solves your problem and is only needed for flower
> but no other tc classifier.
>
> >>
> >> The main culprit is clearly the flow dissector parsing. I am not sure
> >> if in general flowdisc to deal with deeper hierarchies/tunnels
> >> without constantly adding a lot more hacks. Imagine if you had an
> >> ethernet packet with double vlan tags and encapsulating a pppoe packet
> >> (i.e 3 or more layers of ethernet) - that would be a nightmare.
> >
> > Of course there is no limit to our imagination :) However I would argue
> > that in the RealWorld(tm) the number of such nesting cases is
> > neglectable.
> >
>
> Youd be very suprised on how some telcos use things like vlan tags
> as "path metadata" and/or mpls labels and the layering involved in that.
> i.e There is another world out there;-> More important: there is
> nothing illegitimate in any standard about multi-layer tunnels as
> such.
>
> > The evidence is that TC and flower are being actively used. Double VLAN
> > tags configurations notwithstading. IMHO, the fact that I've been
> > unlucky to hit this tunnel corner case does not mean that there is a
> > design problem with the flower.
> >
>
> You have _not_ been unlucky - it is a design issue with flow dissector
> and the wrapping around flower. Just waiting to happen for more
> other use cases..
>
> > AFAICS, the current meaning for the protocol field in TC is:
> >
> > match the innermost layer 2 type through the tunnels that the kernel
> > knows about.
> >
>
> "protocol" describes where you start parsing and matching in
> the header tree - for tc that starts with the outer ethertype.
> The frame has to make sense.
>
> > And it seems to me that this semantic is perfectly fine and does not
> > require fixing. Maybe be we need to mention it in the docs...
> >
>
> Sorry, I disagree.
>
> >> IMO, your approach is adding yet another bandaid.
> >
> > Could you please elaborate why do you see this approach as a bandaid?
> >
> > The patch in question adds another key to the other ~50 that exists in the
> > flower currently. Two more similar keys will be done for single and
> > double VLAN case. As a result, my users will gain the ability to match
> > packets that are impossible to match right now.
> >
>
> You are changing the semantics of the tooling for your single
> use case at the expense of the general syntax. No other tc classifier
> needs that keyword and no other tunneling feature within flower
> needs it. Note:
> It was a mistake allowing the speacial casing of vlans to work
> around the dissector's shortcomings. Lets not add more now
> "fixed" by user space.
> Hacks are already happening with flower in user space
> (at some point someone removed differentiation of "protocol"
> and flower's "ethertype" to accomodate some h/w offload concern).
>
> > In difference with the TC protocol field, orig_ethtype answers the
> > question:
> >
> > what is the original eth type of the packet, independent of the further
> > kernel processing.
> >
>
> "protocol" points to the root of the parse/match tree i.e where
> to start matching.
> Tom (Cced) has a nice diagram somewhere which i cant find right now.
> There are certainly issues with namespace overlap. If you have multiple
> "ethertypes" encapsulated in a parse tree, giving them appropriate
> names would help. Some of the flower syntax allows that nicely but
> only to two levels..
I attached the diagram that Jamal is referring to (see attached). This
is a protocol graph of the various networking protocols parsed by flow
dissector. As complex as this is, there are still several other
protocols that users might be interested to parse and create filter
rules for like UDP encapsulations, TCP options, and HBH and
destination options. Red back links in the diagram indicate protocol
encapsulations which drive the number of possible protocol
combinations in a packet is combinatorial. Protocols are only getting
more and more complex and users are using more protocols and more
combinations of encapsulations, so I don't believe that continuing to
incrementally add to flow dissector or add support in tc-flower one
protocol at a time can scale. We need to rethink this. Also note, this
is no longer just a software problem either, more and more hardware is
built with advanced functionality like parsers and to leverage that
from Linux we essentially want the ability to offload flow dissector
with some assurance that hardware can parse all the same protocols and
produce the exact same metadata output. The end goal is pretty obvious
to me, a user should be able to make filter rules about the arbitrary
protocol and protocol combinations they are using, including custom
protocols, and in that they expect the highest performance possible
for their given environment.
IMO, the only way we can accomplish this is to have a common
programming model where the _exact_ same_ parser program used in
software is offloaded as is the hardware and guarantees the same
effects. This is a one out the outcomes of PANDA, see
https://www.youtube.com/watch?v=zVnmVDSEoXc&list=PLrninrcyMo3L-hsJv23hFyDGRaeBY1EJO&index=23.
Tom
>
>
> > IMHO, the above definition is also quite exact and has the right to
> > exist because we do not have such ability in the current kernel.
> >
> >>
> >> Would it make sense for the setting of the
> >> skb_key.basic.n_proto to be from tp->protocol for
> >> your specific case in classify().
> >>
> >> Which means your original setup:
> >> tc filter add dev eth0 ingress protocol 0x8864 flower \
> >> action simple sdata hi64
> >>
> >> should continue to work if i am not mistaken. Vlans would
> >> continue to be a speacial case.
> >>
> >> I dont know what that would break though...
> >>
> >
> > I think right off the bat, it will break the following user
> > configuration (untested!):
> >
> > tc filter add dev eth0 ingress protocol ipv4 flower \
> > action simple sdata hi64
> >
>
> I dont see how from the code inspection. tp->protocol compare
> already happened by the time flower classify() is invoked.
> i.e it is correct value specified in configuration.
> Dont use your update to iproute2. Just the kernel change
> is needed.
> I am not worried about your use case - just other tunneling
> scenarios; we are going to have to run all tdc testcases and
> a few more to validate.
>
> > Currently, the above rule matches ipv4 packets encapsulated in
> > ETH_P_PPP_SES protocol. The special casing will break it.
>
> Can you describe the rules?
> I think you need something like:
> tc filter add dev eth0 ingress protocol 0x8864 flower \
> ...
> ppp_proto ip \
> dst_ip ...\
> src_ip ....
>
> This will require you introduce new attributes for ppp encaped
> inside pppoe; i dont think there will be namespace collision in
> the case of ip src/dst because it will be tied to ppp_proto
>
> cheers,
> jamal
Download attachment "Flow dissector (2).png" of type "image/png" (92609 bytes)
Powered by blists - more mailing lists