lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250424225738.7xr36vll3vg4irzf@skbuf>
Date: Fri, 25 Apr 2025 01:57:38 +0300
From: Vladimir Oltean <olteanv@...il.com>
To: Jonas Gorski <jonas.gorski@...il.com>
Cc: Florian Fainelli <f.fainelli@...il.com>, Andrew Lunn <andrew@...n.ch>,
	"David S. Miller" <davem@...emloft.net>,
	Eric Dumazet <edumazet@...gle.com>,
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
	Simon Horman <horms@...nel.org>,
	Vladimir Oltean <vladimir.oltean@....com>, netdev@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH net] net: dsa: fix VLAN 0 filter imbalance when toggling
 filtering

On Thu, Apr 24, 2025 at 03:58:50PM +0200, Jonas Gorski wrote:
> On Thu, Apr 24, 2025 at 2:34 PM Florian Fainelli <f.fainelli@...il.com> wrote:
> > On 4/24/2025 12:25 PM, Vladimir Oltean wrote:
> > > On Tue, Apr 22, 2025 at 08:49:13PM +0200, Jonas Gorski wrote:
> > >> When a net device has NETIF_F_HW_VLAN_CTAG_FILTER set, the 8021q code
> > >> will add VLAN 0 when enabling the device, and remove it on disabling it
> > >> again.
> > >>
> > >> But since we are changing NETIF_F_HW_VLAN_CTAG_FILTER during runtime in
> > >> dsa_user_manage_vlan_filtering(), user ports that are already enabled
> > >> may end up with no VLAN 0 configured, or VLAN 0 left configured.
> > >>
> > >> E.g.the following sequence would leave sw1p1 without VLAN 0 configured:
> > >>
> > >> $ ip link add br0 type bridge vlan_filtering 1
> > >> $ ip link set br0 up
> > >> $ ip link set sw1p1 up (filtering is 0, so no HW filter added)
> > >> $ ip link set sw1p1 master br0 (filtering gets set to 1, but already up)
> > >>
> > >> while the following sequence would work:
> > >>
> > >> $ ip link add br0 type bridge vlan_filtering 1
> > >> $ ip link set br0 up
> > >> $ ip link set sw1p1 master br0 (filtering gets set to 1)
> > >> $ ip link set sw1p1 up (filtering is 1, HW filter is added)
> > >>
> > >> Likewise, the following sequence would leave sw1p2 with a VLAN 0 filter
> > >> enabled on a vlan_filtering_is_global dsa switch:
> > >>
> > >> $ ip link add br0 type bridge vlan_filtering 1
> > >> $ ip link set br0 up
> > >> $ ip link set sw1p1 master br0 (filtering set to 1 for all devices)
> > >> $ ip link set sw1p2 up (filtering is 1, so VLAN 0 filter is added)
> > >> $ ip link set sw1p1 nomaster (filtering is reset to 0 again)
> > >> $ ip link set sw1p2 down (VLAN 0 filter is left configured)
> > >>
> > >> This even causes untagged traffic to break on b53 after undoing the
> > >> bridge (though this is partially caused by b53's own doing).
> > >>
> > >> Fix this by emulating 8021q's vlan_device_event() behavior when changing
> > >> the NETIF_F_HW_VLAN_CTAG_FILTER flag, including the printk, so that the
> > >> absence of it doesn't become a red herring.
> > >>
> > >> While vlan_vid_add() has a return value, vlan_device_event() does not
> > >> check its return value, so let us do the same.
> > >>
> > >> Fixes: 06cfb2df7eb0 ("net: dsa: don't advertise 'rx-vlan-filter' when not needed")
> > >> Signed-off-by: Jonas Gorski <jonas.gorski@...il.com>
> > >> ---
> > >
> > > Why does the b53 driver depend on VID 0? CONFIG_VLAN_8021Q can be
> > > disabled or be an unloaded module, how does it work in that case?
> >
> > This is explained in this commit:
> >
> > https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=64a81b24487f0d2fba0f033029eec2abc7d82cee
> >
> > however the case of starting up with CONFIG_VLAN_8021Q and then loading
> > the 8021q module was not thought about, arguably I am not sure what sort
> > of notification or event we can hook onto in order to react properly to
> > that module being loaded. Do you know?
> 
> config BRIDGE_VLAN_FILTERING
>         bool "VLAN filtering"
>         depends on BRIDGE
>         depends on VLAN_8021Q
> 
> without 8021Q there is no vlan filtering bridge, so filtering can
> never be 1, so NETIF_F_HW_VLAN_CTAG_FILTER is never set, so HW filters
> for VLAN 0 are never installed or removed, therefore the issue can
> never happen.

nitpick: except for ds->needs_standalone_vlan_filtering (which b53 does not set though).

> 
> The issue is only if a vlan filtering bridge was there, and now isn't
> anymore, and a previously VLAN 0 HW filter is left intact. This causes
> an incomplete vlan entry left programmed in the vlan table of the chip
> with just this port as a member, which breaks forwarding for that
> VLAN, which is incidentally also the VLAN used for untagged traffic in
> the non filtering case.

Ok, so let's say b53_default_pvid() is 0, and b53_setup() ->
b53_apply_config() -> b53_configure_vlan() calls b53_set_vlan_entry() on
it to set it up, independently of the 8021q layer. So far so good.

But then, the 8021q layer can modify VID 0 on the device from the way in
which it was set up by the driver for VLAN-unaware operation, namely it
can remove it, and this breaks VLAN-unaware reception.

One needs to wonder why would the b53 driver even permit changes coming
from .port_vlan_add() and .port_vlan_del() to a VID it has reserved for
special use. There's nothing to gain from reacting to these operations,
only to lose.

I'm trying to think whether switchdev drivers in general have anything
to benefit from commit ad1afb003939 ("vlan_dev: VLAN 0 should be treated
as "no vlan tag" (802.1p packet)"). I'm tempted to say "thanks for the
well-intended hint about VID 0, but a switchdev's data path is so
complicated that we'd rather manage VID 0 ourselves". Not only do many
drivers reserve VID 0 and thus need to be independent of the 8021q layer
even for VLAN-unaware operation, but also think of this: the bridge may
have vlan_filtering 1 and vlan_default_pvid 0. What will happen if the
8021q layer decides to add VID 0 to the RX filtering table? My logic and
testing with the software data path says that VID 0 traffic should not
be forwarded. My intuition is that it will make b53 accept this kind of
traffic.

Here's a self test I posted exactly for this scenario, if you don't
mind, please let me know what happens, and we'll see where to go from
there. On ocelot, which has commit 9323ac367005 ("net: mscc: ocelot:
ignore VID 0 added by 8021q module"), it passes (but fails elsewhere,
sadly - I've sent a patch also for that).
https://lore.kernel.org/netdev/20250424223734.3096202-2-vladimir.oltean@nxp.com/T/#u

That being said, I don't think we are quite prepared to adopt my
solution (of ignoring VID 0) DSA-wide (especially not as a bug fix),
because it is driver-dependent whether VID 0 is in a conflict with a
VLAN ID reserved for private use or not. Even though adding VID 0 to the
RX filtering table possibly allows its forwarding even when it shouldn't
(and that isn't desirable), there might be some positive benefits as
well - like permitting VID 0 reception when it _should_ be received.

It's a pretty tricky situation, I guess we should first establish what
are the tests that need to pass, then assess on a per-driver basis where
we are. We don't have nearly as much coverage as we would need.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ