netdev - Re: [PATCH net] net: dsa: fix VLAN 0 filter imbalance when toggling filtering

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOiHx=m0nkxczOHQycCjsXcRvs-eP+wGgrUDDuB5UpSnMBSLkw@mail.gmail.com>
Date: Fri, 25 Apr 2025 09:52:13 +0200
From: Jonas Gorski <jonas.gorski@...il.com>
To: Vladimir Oltean <olteanv@...il.com>
Cc: Florian Fainelli <f.fainelli@...il.com>, Andrew Lunn <andrew@...n.ch>, 
	"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>, 
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>, 
	Vladimir Oltean <vladimir.oltean@....com>, netdev@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH net] net: dsa: fix VLAN 0 filter imbalance when toggling filtering

On Fri, Apr 25, 2025 at 12:57 AM Vladimir Oltean <olteanv@...il.com> wrote:
>
> On Thu, Apr 24, 2025 at 03:58:50PM +0200, Jonas Gorski wrote:
> > On Thu, Apr 24, 2025 at 2:34 PM Florian Fainelli <f.fainelli@...il.com> wrote:
> > > On 4/24/2025 12:25 PM, Vladimir Oltean wrote:
> > > > On Tue, Apr 22, 2025 at 08:49:13PM +0200, Jonas Gorski wrote:
> > > >> When a net device has NETIF_F_HW_VLAN_CTAG_FILTER set, the 8021q code
> > > >> will add VLAN 0 when enabling the device, and remove it on disabling it
> > > >> again.
> > > >>
> > > >> But since we are changing NETIF_F_HW_VLAN_CTAG_FILTER during runtime in
> > > >> dsa_user_manage_vlan_filtering(), user ports that are already enabled
> > > >> may end up with no VLAN 0 configured, or VLAN 0 left configured.
> > > >>
> > > >> E.g.the following sequence would leave sw1p1 without VLAN 0 configured:
> > > >>
> > > >> $ ip link add br0 type bridge vlan_filtering 1
> > > >> $ ip link set br0 up
> > > >> $ ip link set sw1p1 up (filtering is 0, so no HW filter added)
> > > >> $ ip link set sw1p1 master br0 (filtering gets set to 1, but already up)
> > > >>
> > > >> while the following sequence would work:
> > > >>
> > > >> $ ip link add br0 type bridge vlan_filtering 1
> > > >> $ ip link set br0 up
> > > >> $ ip link set sw1p1 master br0 (filtering gets set to 1)
> > > >> $ ip link set sw1p1 up (filtering is 1, HW filter is added)
> > > >>
> > > >> Likewise, the following sequence would leave sw1p2 with a VLAN 0 filter
> > > >> enabled on a vlan_filtering_is_global dsa switch:
> > > >>
> > > >> $ ip link add br0 type bridge vlan_filtering 1
> > > >> $ ip link set br0 up
> > > >> $ ip link set sw1p1 master br0 (filtering set to 1 for all devices)
> > > >> $ ip link set sw1p2 up (filtering is 1, so VLAN 0 filter is added)
> > > >> $ ip link set sw1p1 nomaster (filtering is reset to 0 again)
> > > >> $ ip link set sw1p2 down (VLAN 0 filter is left configured)
> > > >>
> > > >> This even causes untagged traffic to break on b53 after undoing the
> > > >> bridge (though this is partially caused by b53's own doing).
> > > >>
> > > >> Fix this by emulating 8021q's vlan_device_event() behavior when changing
> > > >> the NETIF_F_HW_VLAN_CTAG_FILTER flag, including the printk, so that the
> > > >> absence of it doesn't become a red herring.
> > > >>
> > > >> While vlan_vid_add() has a return value, vlan_device_event() does not
> > > >> check its return value, so let us do the same.
> > > >>
> > > >> Fixes: 06cfb2df7eb0 ("net: dsa: don't advertise 'rx-vlan-filter' when not needed")
> > > >> Signed-off-by: Jonas Gorski <jonas.gorski@...il.com>
> > > >> ---
> > > >
> > > > Why does the b53 driver depend on VID 0? CONFIG_VLAN_8021Q can be
> > > > disabled or be an unloaded module, how does it work in that case?
> > >
> > > This is explained in this commit:
> > >
> > > https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=64a81b24487f0d2fba0f033029eec2abc7d82cee
> > >
> > > however the case of starting up with CONFIG_VLAN_8021Q and then loading
> > > the 8021q module was not thought about, arguably I am not sure what sort
> > > of notification or event we can hook onto in order to react properly to
> > > that module being loaded. Do you know?
> >
> > config BRIDGE_VLAN_FILTERING
> >         bool "VLAN filtering"
> >         depends on BRIDGE
> >         depends on VLAN_8021Q
> >
> > without 8021Q there is no vlan filtering bridge, so filtering can
> > never be 1, so NETIF_F_HW_VLAN_CTAG_FILTER is never set, so HW filters
> > for VLAN 0 are never installed or removed, therefore the issue can
> > never happen.
>
> nitpick: except for ds->needs_standalone_vlan_filtering (which b53 does not set though).
>
> >
> > The issue is only if a vlan filtering bridge was there, and now isn't
> > anymore, and a previously VLAN 0 HW filter is left intact. This causes
> > an incomplete vlan entry left programmed in the vlan table of the chip
> > with just this port as a member, which breaks forwarding for that
> > VLAN, which is incidentally also the VLAN used for untagged traffic in
> > the non filtering case.
>
> Ok, so let's say b53_default_pvid() is 0, and b53_setup() ->
> b53_apply_config() -> b53_configure_vlan() calls b53_set_vlan_entry() on
> it to set it up, independently of the 8021q layer. So far so good.
>
> But then, the 8021q layer can modify VID 0 on the device from the way in
> which it was set up by the driver for VLAN-unaware operation, namely it
> can remove it, and this breaks VLAN-unaware reception.
>
> One needs to wonder why would the b53 driver even permit changes coming
> from .port_vlan_add() and .port_vlan_del() to a VID it has reserved for
> special use. There's nothing to gain from reacting to these operations,
> only to lose.
>
> I'm trying to think whether switchdev drivers in general have anything
> to benefit from commit ad1afb003939 ("vlan_dev: VLAN 0 should be treated
> as "no vlan tag" (802.1p packet)"). I'm tempted to say "thanks for the
> well-intended hint about VID 0, but a switchdev's data path is so
> complicated that we'd rather manage VID 0 ourselves". Not only do many
> drivers reserve VID 0 and thus need to be independent of the 8021q layer
> even for VLAN-unaware operation, but also think of this: the bridge may
> have vlan_filtering 1 and vlan_default_pvid 0. What will happen if the
> 8021q layer decides to add VID 0 to the RX filtering table? My logic and
> testing with the software data path says that VID 0 traffic should not
> be forwarded. My intuition is that it will make b53 accept this kind of
> traffic.
>
> Here's a self test I posted exactly for this scenario, if you don't
> mind, please let me know what happens, and we'll see where to go from
> there. On ocelot, which has commit 9323ac367005 ("net: mscc: ocelot:
> ignore VID 0 added by 8021q module"), it passes (but fails elsewhere,
> sadly - I've sent a patch also for that).
> https://lore.kernel.org/netdev/20250424223734.3096202-2-vladimir.oltean@nxp.com/T/#u
>
> That being said, I don't think we are quite prepared to adopt my
> solution (of ignoring VID 0) DSA-wide (especially not as a bug fix),
> because it is driver-dependent whether VID 0 is in a conflict with a
> VLAN ID reserved for private use or not. Even though adding VID 0 to the
> RX filtering table possibly allows its forwarding even when it shouldn't
> (and that isn't desirable), there might be some positive benefits as
> well - like permitting VID 0 reception when it _should_ be received.
>
> It's a pretty tricky situation, I guess we should first establish what
> are the tests that need to pass, then assess on a per-driver basis where
> we are. We don't have nearly as much coverage as we would need.

I gave it a test with a vlan_filtering bridge with no PVID / egress
untagged vlan defined on a pure software bridge, and STP continued to
work fine. So in a sense, VLAN 0 is needed, as we still need to allow
untagged traffic to be received regardless of a PVID egress untagged
VLAN being defined.

But we shouldn't forward it (except to the cpu port) unless it is part
of a PVID egress untagged VLAN. This is the tricky part. If (dsa)
switch drivers ensure that untagged traffic always reaches the cpu
port, then we can ignore VLAN 0.

So I think this boils down to that dsa needs a way to pass on to
drivers whether a VLAN should be forwarded to other members or not
when adding it to a port.

Currently, from a dsa driver perspective, the following two scenarios
would be indistinguishable:

$ ip link add br0 type bridge vlan_filtering 1
$ ip link set sw1p1 master br0
$ ip link set sw1p2 master br0
$ bridge vlan add dev sw1p1 vid 10
$ bridge vlan add dev sw2p1 vid 10

and

$ ip link add br0 type bridge vlan_filtering 1
$ ip link set sw1p1 master br0
$ ip link set sw1p2 master br0
$ ip link add sw1p1.10 link sw1p1 type vlan id 10
$ ip link add sw1p2.10 link sw1p2 type vlan id 10

But in the second case, swp1p1 and sw1p2 should be isolated.

This is because vlan filters and bridge vlans result in the same
port_vlan_add() call, with no way of the driver to tell from where the
call comes from.

And yes, this is something that is probably hard to configure for many
smaller embedded switch chips. E.g. b53 supported switches do not have
forward/flood/etc masks per VLAN, so some cheating/workaround is
needed here. switchdev.rst says to fall back to software forwarding if
there is no other way. I have some ideas, but I will first need to
verify that they work ... .

Regards,
Jonas