netdev - Re: VLAN tags in mac

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190615151913.cgrfyflwwnhym4u2@ast-mbp.dhcp.thefacebook.com>
Date:   Sat, 15 Jun 2019 08:19:14 -0700
From:   Alexei Starovoitov <alexei.starovoitov@...il.com>
To:     Johannes Berg <johannes@...solutions.net>
Cc:     netdev@...r.kernel.org, bridge@...ts.linux-foundation.org,
        nikolay@...ulusnetworks.com, roopa@...ulusnetworks.com,
        jhs@...atatu.com, David Ahern <dsahern@...il.com>,
        Zahari Doychev <zahari.doychev@...ux.com>,
        Simon Horman <simon.horman@...ronome.com>,
        Toshiaki Makita <makita.toshiaki@....ntt.co.jp>,
        Cong Wang <xiyou.wangcong@...il.com>,
        Jiri Pirko <jiri@...lanox.com>,
        Alexei Starovoitov <ast@...mgrid.com>
Subject: Re: VLAN tags in mac_len

On Fri, Jun 14, 2019 at 12:18:41PM +0200, Johannes Berg wrote:
> 
> Possible solutions?
> 
> So far, Zahari tried three different ways of fixing this:
> 
>  1) Make the bridge code use skb->mac_len instead of ETH_HLEN. This
>     works for this particular case, but breaks some other cases;
>     evidently some places exist where skb->mac_len isn't even set to
>     ETH_HLEN when a packet gets to the bridge. I don't know right now
>     what that was, I think probably somebody who's CC'ed reported that.
> 
>  2) Let tc_act_vlan() just pull ETH_HLEN instead of skb->mac_len, but
>     this is rather asymmetric and strange, and while it works for this
>     case it may cause confusion elsewhere.
> 
>  2b) Toshiaki said it might be better to make that code *remember*
>      mac_len and then use it to push and pull (so not caring about the
>      change made by skb_vlan_push()), but that ultimately leads to
>      confusion and if you have TC push/pop combinations things just get
>      completely out of sync and die
> 
>  3) Make skb_vlan_push()/_pop() just not change mac_len at all. So far
>     this also addresses the issue, but it's likely that this will break
>     OVS, and I don't know how it'd affect BPF. Quite possibly like TC
>     does and is broken, but perhaps not.
> 
> 
> But now we're stuck. Depending on how you look at it, all of these seem
> sort of reasonable, or not.
> 
> Ultimately, the issue seems to be that we couldn't really decide whether
> VLAN tags (and probably MPLS tags, for that matter) are covered by
> mac_len or not. At least not consistently on ingress and egress.
> eth_type_trans() doesn't take them into account, so of course on simple
> ingress mac_len will only cover the ETH_HLEN.
> 
> If you have an accelerated tag and then push it into the SKB, it will
> *not* be taken into account in mac_len. OTOH, if you have a new tag and
> use skb_vlan_push() then it *will* be taken into account.
> 
> 
> I'm trending towards solution (3), because if we consider other
> combinations of VLAN push/pop in TC, I think we can end up in a very
> messy situation today. For example, POP/PUSH seems like it should be a
> no-op, but it isn't due to the mac_len, *unless* it can use the HW accel
> only (i.e. only a single tag).
> 
> I think then to propose such a patch though we'd have to figure out
> where the BPF case is, and to keep OVS working probably either add an
> argument ("bool adjust_mac_len") to the function signatures, or just do
> the adjustments in OVS code after calling them?
> 
> 
> Any other thoughts?

imo skb_vlan_push() should still change mac_len.
tc, ovs, bpf use it and expect vlan to be part of L2.
There is nothing between L2 and L3 :)
Hence we cannot say that vlan is not part of L2.
Hence push/pop vlan must change mac_len, since skb->mac_len
is kernel's definition of the length of L2 header.

Now as far as bridge... I think it's unfortunate that linux
adopted 'vlan' as a netdevice model and that's where I think
the problem is.
Typical bridge in the networking industry is a device that
does forwarding based on L2. Which includes vlans.
And imo that's the most appropriate way of configuring and thinking
about bridge functionality.
Whereas in the kernel there is a 'vlan' netdevice that 'eats'
vlan tag and pretends that the rest is the same.
So linux bridge kinda doesn't need to be vlan aware.
CONFIG_BRIDGE_VLAN_FILTERING was the right step, but I haven't
seen it being used and I'm not sure about state of things there.

So your option 1 above is imo the best. The bridge needs to deal
with skb->mac_len and full L2 header.