[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190615151913.cgrfyflwwnhym4u2@ast-mbp.dhcp.thefacebook.com>
Date: Sat, 15 Jun 2019 08:19:14 -0700
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Johannes Berg <johannes@...solutions.net>
Cc: netdev@...r.kernel.org, bridge@...ts.linux-foundation.org,
nikolay@...ulusnetworks.com, roopa@...ulusnetworks.com,
jhs@...atatu.com, David Ahern <dsahern@...il.com>,
Zahari Doychev <zahari.doychev@...ux.com>,
Simon Horman <simon.horman@...ronome.com>,
Toshiaki Makita <makita.toshiaki@....ntt.co.jp>,
Cong Wang <xiyou.wangcong@...il.com>,
Jiri Pirko <jiri@...lanox.com>,
Alexei Starovoitov <ast@...mgrid.com>
Subject: Re: VLAN tags in mac_len
On Fri, Jun 14, 2019 at 12:18:41PM +0200, Johannes Berg wrote:
>
> Possible solutions?
>
> So far, Zahari tried three different ways of fixing this:
>
> 1) Make the bridge code use skb->mac_len instead of ETH_HLEN. This
> works for this particular case, but breaks some other cases;
> evidently some places exist where skb->mac_len isn't even set to
> ETH_HLEN when a packet gets to the bridge. I don't know right now
> what that was, I think probably somebody who's CC'ed reported that.
>
> 2) Let tc_act_vlan() just pull ETH_HLEN instead of skb->mac_len, but
> this is rather asymmetric and strange, and while it works for this
> case it may cause confusion elsewhere.
>
> 2b) Toshiaki said it might be better to make that code *remember*
> mac_len and then use it to push and pull (so not caring about the
> change made by skb_vlan_push()), but that ultimately leads to
> confusion and if you have TC push/pop combinations things just get
> completely out of sync and die
>
> 3) Make skb_vlan_push()/_pop() just not change mac_len at all. So far
> this also addresses the issue, but it's likely that this will break
> OVS, and I don't know how it'd affect BPF. Quite possibly like TC
> does and is broken, but perhaps not.
>
>
> But now we're stuck. Depending on how you look at it, all of these seem
> sort of reasonable, or not.
>
> Ultimately, the issue seems to be that we couldn't really decide whether
> VLAN tags (and probably MPLS tags, for that matter) are covered by
> mac_len or not. At least not consistently on ingress and egress.
> eth_type_trans() doesn't take them into account, so of course on simple
> ingress mac_len will only cover the ETH_HLEN.
>
> If you have an accelerated tag and then push it into the SKB, it will
> *not* be taken into account in mac_len. OTOH, if you have a new tag and
> use skb_vlan_push() then it *will* be taken into account.
>
>
> I'm trending towards solution (3), because if we consider other
> combinations of VLAN push/pop in TC, I think we can end up in a very
> messy situation today. For example, POP/PUSH seems like it should be a
> no-op, but it isn't due to the mac_len, *unless* it can use the HW accel
> only (i.e. only a single tag).
>
> I think then to propose such a patch though we'd have to figure out
> where the BPF case is, and to keep OVS working probably either add an
> argument ("bool adjust_mac_len") to the function signatures, or just do
> the adjustments in OVS code after calling them?
>
>
> Any other thoughts?
imo skb_vlan_push() should still change mac_len.
tc, ovs, bpf use it and expect vlan to be part of L2.
There is nothing between L2 and L3 :)
Hence we cannot say that vlan is not part of L2.
Hence push/pop vlan must change mac_len, since skb->mac_len
is kernel's definition of the length of L2 header.
Now as far as bridge... I think it's unfortunate that linux
adopted 'vlan' as a netdevice model and that's where I think
the problem is.
Typical bridge in the networking industry is a device that
does forwarding based on L2. Which includes vlans.
And imo that's the most appropriate way of configuring and thinking
about bridge functionality.
Whereas in the kernel there is a 'vlan' netdevice that 'eats'
vlan tag and pretends that the rest is the same.
So linux bridge kinda doesn't need to be vlan aware.
CONFIG_BRIDGE_VLAN_FILTERING was the right step, but I haven't
seen it being used and I'm not sure about state of things there.
So your option 1 above is imo the best. The bridge needs to deal
with skb->mac_len and full L2 header.
Powered by blists - more mailing lists