lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <813be3bd0823bac31dc1b018750fad29d794d9c2.camel@redhat.com>
Date: Tue, 16 May 2023 10:34:11 +0200
From: Paolo Abeni <pabeni@...hat.com>
To: Taehee Yoo <ap420073@...il.com>, Nikolay Aleksandrov
 <razor@...ckwall.org>,  davem@...emloft.net, kuba@...nel.org,
 edumazet@...gle.com, jiri@...nulli.us,  j.vosburgh@...il.com,
 andy@...yhouse.net, netdev@...r.kernel.org
Cc: jarod@...hat.com, wangyufen@...wei.com, 
	syzbot+60748c96cf5c6df8e581@...kaller.appspotmail.com
Subject: Re: [PATCH net] net: fix stack overflow when LRO is disabled for
 virtual interfaces

On Mon, 2023-05-15 at 18:12 +0900, Taehee Yoo wrote:
> On 5/15/23 15:24, Nikolay Aleksandrov wrote:
>  > On 15/05/2023 08:37, Taehee Yoo wrote:
>  >> When the virtual interface's feature is updated, it synchronizes the
>  >> updated feature for its own lower interface.
>  >> This propagation logic should be worked as the iteration, not 
> recursively.
>  >> But it works recursively due to the netdev notification unexpectedly.
>  >> This problem occurs when it disables LRO only for the team and bonding
>  >> interface type.
>  >>
>  >>         team0
>  >>           |
>  >>    +------+------+-----+-----+
>  >>    |      |      |     |     |
>  >> team1  team2  team3  ...  team200
>  >>
>  >> If team0's LRO feature is updated, it generates the NETDEV_FEAT_CHANGE
>  >> event to its own lower interfaces(team1 ~ team200).
>  >> It is worked by netdev_sync_lower_features().
>  >> So, the NETDEV_FEAT_CHANGE notification logic of each lower interface
>  >> work iteratively.
>  >> But generated NETDEV_FEAT_CHANGE event is also sent to the upper
>  >> interface too.
>  >> upper interface(team0) generates the NETDEV_FEAT_CHANGE event for 
> its own
>  >> lower interfaces again.
>  >> lower and upper interfaces receive this event and generate this
>  >> event again and again.
>  >> So, the stack overflow occurs.
>  >>
>  >> But it is not the infinite loop issue.
>  >> Because the netdev_sync_lower_features() updates features before
>  >> generating the NETDEV_FEAT_CHANGE event.
>  >> Already synchronized lower interfaces skip notification logic.
>  >> So, it is just the problem that iteration logic is changed to the
>  >> recursive unexpectedly due to the notification mechanism.
>  >>
>  >> Reproducer:
>  >>
>  >> ip link add team0 type team
>  >> ethtool -K team0 lro on
>  >> for i in {1..200}
>  >> do
>  >>          ip link add team$i master team0 type team
>  >>          ethtool -K team$i lro on
>  >> done
>  >>
>  >> ethtool -K team0 lro off
>  >>
>  >> In order to fix it, the priv_notifier_ctx net_device member is 
> introduced.
>  >> This variable can be used by each interface in its own way in the
>  >> notification context. The bonding and team interface is going to use it
>  >> to avoid duplicated NETDEV_FEAT_CHANGE event handling.
>  >>
>  >> Reported-by: syzbot+60748c96cf5c6df8e581@...kaller.appspotmail.com
>  >> Fixes: fd867d51f889 ("net/core: generic support for disabling netdev 
> features down stack")
>  >> Signed-off-by: Taehee Yoo <ap420073@...il.com>
>  >> ---
>  >>   drivers/net/bonding/bond_main.c | 6 +++++-
>  >>   drivers/net/team/team.c         | 6 +++++-
>  >>   include/linux/netdevice.h       | 1 +
>  >>   net/core/dev.c                  | 2 ++
>  >>   4 files changed, 13 insertions(+), 2 deletions(-)
>  >>
>  >
>  > Since you're syncing to lower devices, can't you check if the event 
> source device
>  > is lower to the current one (i.e. reverse propagation has happened) 
> in the affected
>  > drivers ? Adding a new struct netdevice member just for this seems 
> unnecessary to me.
>  > Especially for a setup like a bond of bonds or a team of teams, these 
> are corner case
>  > setups that shouldn't exist in general. :)
>  >
> 
> I agree that this new variable is unnecessary right now.
> I tried to avoid introducing new variables, but unfortunately, I 
> couldn't find a solution to detect duplicated notification events.
> 
> The reason why I introduced the new member of the net_device is that I 
> thought there might be similar problems in the future such as mtu.
> so, I hoped that it can be used as a general variable to avoid similar 
> problems.
> But I really agree that this new variable is over-spec.
> So, adding a new boolean variable into the struct bonding and team, not 
> net_device would be reasonable if I can't find a proper solution.

I think adding a bool variable to bonding/team priv would be better, as
it looks like the issues is specific to such kind of devices.

Thanks!

Paolo


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ