[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0f531295-e289-022d-5add-5ceffa0df9bc@quietfountain.com>
Date: Tue, 11 Jul 2023 17:44:20 -0500
From: Harry Coin <hcoin@...etfountain.com>
To: Kuniyuki Iwashima <kuniyu@...zon.com>
Cc: andrew@...n.ch, netdev@...r.kernel.org
Subject: Re: llc needs namespace awareness asap, was Re: Patch fixing STP if
bridge in non-default namespace.
On 7/11/23 16:51, Kuniyuki Iwashima wrote:
> From: Harry Coin<hcoin@...etfountain.com>
> Date: Tue, 11 Jul 2023 16:40:03 -0500
>> On 7/11/23 15:44, Andrew Lunn wrote:
>>>>>>>> The current llc_rcv.c around line 166 in net/llc/llc_input.c has
>>>>>>>>
>>>>>>>> if (!net_eq(dev_net(dev), &init_net))
>>>>>>>> goto drop;
>>>>>>>>
>>>> Thank you! When you offer your patches, and you hear worries about being
>>>> 'invasive', it's worth asking 'compared to what' -- since the 'status quo'
>>>> is every bridge with STP in a non default namespace with a loop in it
>>>> somewhere will freeze every connected system more solid than ice in
>>>> Antarctica.
>>> https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
>>>
>>> say:
>>>
>>> o It must be obviously correct and tested.
>>> o It cannot be bigger than 100 lines, with context.
>>> o It must fix only one thing.
>>> o It must fix a real bug that bothers people (not a, "This could be a problem..." type thing).
>>>
>>> git blame shows:
>>>
>>> commit 721499e8931c5732202481ae24f2dfbf9910f129
>>> Author: YOSHIFUJI Hideaki<yoshfuji@...ux-ipv6.org>
>>> Date: Sat Jul 19 22:34:43 2008 -0700
>>>
>>> netns: Use net_eq() to compare net-namespaces for optimization.
>>>
>>> diff --git a/net/llc/llc_input.c b/net/llc/llc_input.c
>>> index 1c45f172991e..57ad974e4d94 100644
>>> --- a/net/llc/llc_input.c
>>> +++ b/net/llc/llc_input.c
>>> @@ -150,7 +150,7 @@ int llc_rcv(struct sk_buff *skb, struct net_device *dev,
>>> int (*rcv)(struct sk_buff *, struct net_device *,
>>> struct packet_type *, struct net_device *);
>>>
>>> - if (dev_net(dev) != &init_net)
>>> + if (!net_eq(dev_net(dev), &init_net))
>>> goto drop;
>>>
>>> /*
>>>
>>> So this is just an optimization.
>>>
>>> The test itself was added in
>>>
>>> ommit e730c15519d09ea528b4d2f1103681fa5937c0e6
>>> Author: Eric W. Biederman<ebiederm@...ssion.com>
>>> Date: Mon Sep 17 11:53:39 2007 -0700
>>>
>>> [NET]: Make packet reception network namespace safe
>>>
>>> This patch modifies every packet receive function
>>> registered with dev_add_pack() to drop packets if they
>>> are not from the initial network namespace.
>>>
>>> This should ensure that the various network stacks do
>>> not receive packets in a anything but the initial network
>>> namespace until the code has been converted and is ready
>>> for them.
>>>
>>> Signed-off-by: Eric W. Biederman<ebiederm@...ssion.com>
>>> Signed-off-by: David S. Miller<davem@...emloft.net>
>>>
>>> So that was over 15 years ago.
>>>
>>> It appears it has not bothered people for over 15 years.
>>>
>>> Lets wait until we get to see the actual fix. We can then decide how
>>> simple/hard it is to back port to stable, if it fulfils the stable
>>> rules or not.
>>>
>>> Andrew
>> Andrew, fair enough. In the time until it's fixed, the kernel folks
>> should publish an advisory and block any attempt to set bridge stp state
>> to other than 0 if in a non-default namespace. The alternative is a
>> packet flood at whatever the top line speed is should there be a loop
>> somewhere in even one connected link.
> Like this ? Someone who didn't notice the issue might complain about
> it as regression.
>
> ---8<---
> diff --git a/net/bridge/br_stp_if.c b/net/bridge/br_stp_if.c
> index 75204d36d7f9..a807996ac56b 100644
> --- a/net/bridge/br_stp_if.c
> +++ b/net/bridge/br_stp_if.c
> @@ -201,6 +201,11 @@ int br_stp_set_enabled(struct net_bridge *br, unsigned long val,
> {
> ASSERT_RTNL();
>
> + if (!net_eq(dev_net(br->dev), &init_net)) {
> + NL_SET_ERR_MSG_MOD(extack, "STP can't be enabled in non-root netns");
> + return -EINVAL;
> + }
> +
> if (br_mrp_enabled(br)) {
> NL_SET_ERR_MSG_MOD(extack,
> "STP can't be enabled if MRP is already enabled");
> ---8<---
Something like that, but to your point about regression -- It a
statistically good bet there are many bridges with STP enabled in
non-default name spaces that simply have no loops in L2 BUT these are
'buried' inside docker images or prepackaged setups that work 'just
fine standalone' and also 'in lab namespaces (that just don't have L2
loops...) and so that don't hit the bug. These users are one "cable
click to an open port already connected somewhere they can't see" away
from bringing down every computer on their entire link (like me, been
there, it's not a happy week for anyone...), they just don't know it.
And their vendors 'trust STP, so that can't be it' --- but it is.
If the patch above gets installed-- then packagers downstream will have
to respond with effort to add code to turn off STP if finding themselves
in a namespace, and not if not. They will be displeased at having to
accommodate then de-accommodate when the fix lands. As a 'usually
downstream of the kernel' developer, I'd rather be warned than blocked.
Looking at those dates... wow! I expect other os kernels and standalone
switch vendors would see fixing this one as a removing a reliability
advantage they've had for a long time.
Perhaps a broadcast advisory "Until this is fixed, your site will have a
packet flood worse than an internal DDOS attack if there's a loop in a
link layer and if even one docker image or prepackaged project uses a
net bridge with STP enabled and is deployed in a non-default netns / net
namespace. Check with your package vendors if you're not sure. You'll
avoid this problem if your link layer layout chart is tree-and-branch
without even one crosslink." Yup, that'll be somewhat less than
popular. But better warned and awaiting a fix than blocked.
How hard can the fix be? Instead of dropping the packet if in the
non-default namespace, as each device is in a namespace it should be
fine to pass the packet only to listeners in the same namespace as the
device that received the packet. Back in the day this code was written,
it was probably 'hard to know' among the multicast subscribers what
namespace they were in.
I suspect the impact of this fix on existing code will be minor since
the only effect will be packets appearing where they were expected
before but not received.
Harry
--
Powered by blists - more mailing lists