lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACRpkdYUp2m8LXfngi05O=ro5-8vicpkNJa=PUGzc4KDBsuMyA@mail.gmail.com>
Date:   Tue, 14 Sep 2021 01:20:14 +0200
From:   Linus Walleij <linus.walleij@...aro.org>
To:     Vladimir Oltean <olteanv@...il.com>
Cc:     Andrew Lunn <andrew@...n.ch>,
        Vivien Didelot <vivien.didelot@...il.com>,
        Florian Fainelli <f.fainelli@...il.com>,
        "David S . Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        netdev <netdev@...r.kernel.org>,
        Mauri Sandberg <sandberg@...lfence.com>,
        Alvin Šipraga <alsi@...g-olufsen.dk>,
        DENG Qingfang <dqfext@...il.com>
Subject: Re: [PATCH net-next 5/8] net: dsa: rtl8366: Disable "4K" VLANs

Hi Vladimir,

first, thanks for your help and patience. I learned a lot the recent
weeks, much thanks to your questions and explanations!

On Mon, Sep 13, 2021 at 5:34 PM Vladimir Oltean <olteanv@...il.com> wrote:

> > This was discovered when testing with OpenWrt that join
> > the LAN ports lan0 ... lan3 into a bridge and then assign
> > each of them into VLAN 1 with PVID set on each port: without
> > this patch this will not work and the bridge goes numb.
>
> It is important to explain _why_ the switch will go "numb" and not pass
> packets if the Linux bridge assigns all ports to VLAN ID 1 as pvid. It
> is certainly not expected for that to happen.

Yeah it is pretty weird. What happens now is that this is a regression
when using OpenWrt userspace as it sets up the VLANs like this,
but if I boot a clean system and just manually do e.g.
ifconfig lan0 169.254.1.2 netmask 255.255.255.0 up
it works fine because the default VLANs that were set up by the
driver (removed by patch 2/8) will tag all packets using PVID and
send packets on 5 ingress and 1 egress VLANs.

> The purpose of the PVID feature is specifically to classify untagged
> packets to a port-based VLAN ID. So "everything is a VLAN" even for
> Linux user space, not sure what you're talking about.

I think what happens is that OpenWrts userspace sets VLAN 1
for all ingress ports with PVID, so all packets from ingress ports
get tagged nicely with VID 1.

But as the CPU port is hidden inside the bridge
it can't join the CPU port into that VLAN (userspace does not
know it exist I think?) and thus no packets
can go into or out of the CPU port. But you can still pass packets
between the lan ports.

> When the Linux bridge has the vlan_filtering attribute set to 1, the
> hardware should follow suit by making untagged packets get classified to
> the VLAN ID that the software bridge wants to see, on the ports that are
> members of that bridge.

This is what it does, I think.

But the "4K" VLAN feature is so strict that it will restrict also the CPU
port from this (in hardware) with no way to turn it off.

It seems the "4K" mode is a "VLAN with filtering only mode" so no
matter whether we turned on filtering or not, the CPU port
will not see any packets from any other ports unless we add also
that port (port 5) into the VLAN.

One solution I could try would be to just add the CPU port to all
VLANs by default, but .. is that right?

I suppose this would work as software will add the right
VID to the packets so they will only propagate to the right
ports anyway. It could test it.

> When the Linux bridge has the vlan_filtering attribute set to 0, the
> software bridge very much ignores any VLAN tags from packets, and does
> not perform any VLAN-based ingress admission checks. If the hardware
> classifies all packets to a VLAN even when VLAN "filtering" (i.e.
> ingress dropping on mismatch) is disabled, that is perfectly fine too,

I think this is what happens in this hardware.

> although the software bridge doesn't care. You need to set up a private
> VLAN ID for your VLAN-unaware ports, and make it the pvid on those ports,

Would the CPU port be a VLAN-unaware port?

My problem is that in the "4K" mode, the CPU port will not see packets
from any VLAN it is not a member of.

> and somehow force the hardware to classify any packet towards that pvid
> on those VLAN-unaware ports, regardless of whether the packets are
> untagged or 802.1Q-tagged or 802.1ad-tagged or whatever. That is simply
> the way things are supposed to work.
>
> VLAN ID 0 and 4095 are good candidates to use privately within your
> driver as the pvid on VLAN-unaware ports, and you can/must manually
> bring up these VLANs, since the bridge will refuse to install these
> VLANs in its database.
>
> Other VLAN IDs like the range 4000-4094 are also potentially ok as long
> as you document the fact that your driver crops that range out of the
> usable range of the bridge, and you make sure that no packet leaks
> inside or outside of those private VLANs are possible ("attackers" could
> still try to send a packet tagged with VLAN ID 4094 towards a port that
> is under a VLAN-aware bridge. Since that port is VLAN-aware, it will
> recognize the VLAN ID as 4094, so unless you configure that port to drop
> VLAN ID 4094, it might well leak into the VLAN domain 4094 which is
> privately used by your driver to ensure VLAN-unaware forwarding between
> the ports of a nearby VLAN-unaware bridge.

I don't have any VLAN-unaware ports other than the CPU
port.

What happened before patch 2/8 was that it was given its own VLAN
with PVID and all other ports were assigned members of that VLAN as
well, and thus egress traffic from the CPU port could go out.

For ingress traffic, the CPU port was member of all ingress
VLANs. (Also removed by patch 2/8)

> I know there are lots of things to think about, but this patch is way
> too simplistic and does not really offer solid explanations.

I'm trying my best and learning along the way :)
It's fine, no hurry and let's get this right so that the other
RTL switches can look at this a good example.

Yours,
Linus Walleij

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ