lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <52da9cd3-508f-eb7d-98b3-cd777acc90eb@gmail.com>
Date: Mon, 15 May 2023 18:12:52 +0900
From: Taehee Yoo <ap420073@...il.com>
To: Nikolay Aleksandrov <razor@...ckwall.org>, davem@...emloft.net,
 kuba@...nel.org, pabeni@...hat.com, edumazet@...gle.com, jiri@...nulli.us,
 j.vosburgh@...il.com, andy@...yhouse.net, netdev@...r.kernel.org
Cc: jarod@...hat.com, wangyufen@...wei.com,
 syzbot+60748c96cf5c6df8e581@...kaller.appspotmail.com
Subject: Re: [PATCH net] net: fix stack overflow when LRO is disabled for
 virtual interfaces



On 5/15/23 15:24, Nikolay Aleksandrov wrote:

Hi Nikolay,
Thank you so much for the review!

 > On 15/05/2023 08:37, Taehee Yoo wrote:
 >> When the virtual interface's feature is updated, it synchronizes the
 >> updated feature for its own lower interface.
 >> This propagation logic should be worked as the iteration, not 
recursively.
 >> But it works recursively due to the netdev notification unexpectedly.
 >> This problem occurs when it disables LRO only for the team and bonding
 >> interface type.
 >>
 >>         team0
 >>           |
 >>    +------+------+-----+-----+
 >>    |      |      |     |     |
 >> team1  team2  team3  ...  team200
 >>
 >> If team0's LRO feature is updated, it generates the NETDEV_FEAT_CHANGE
 >> event to its own lower interfaces(team1 ~ team200).
 >> It is worked by netdev_sync_lower_features().
 >> So, the NETDEV_FEAT_CHANGE notification logic of each lower interface
 >> work iteratively.
 >> But generated NETDEV_FEAT_CHANGE event is also sent to the upper
 >> interface too.
 >> upper interface(team0) generates the NETDEV_FEAT_CHANGE event for 
its own
 >> lower interfaces again.
 >> lower and upper interfaces receive this event and generate this
 >> event again and again.
 >> So, the stack overflow occurs.
 >>
 >> But it is not the infinite loop issue.
 >> Because the netdev_sync_lower_features() updates features before
 >> generating the NETDEV_FEAT_CHANGE event.
 >> Already synchronized lower interfaces skip notification logic.
 >> So, it is just the problem that iteration logic is changed to the
 >> recursive unexpectedly due to the notification mechanism.
 >>
 >> Reproducer:
 >>
 >> ip link add team0 type team
 >> ethtool -K team0 lro on
 >> for i in {1..200}
 >> do
 >>          ip link add team$i master team0 type team
 >>          ethtool -K team$i lro on
 >> done
 >>
 >> ethtool -K team0 lro off
 >>
 >> In order to fix it, the priv_notifier_ctx net_device member is 
introduced.
 >> This variable can be used by each interface in its own way in the
 >> notification context. The bonding and team interface is going to use it
 >> to avoid duplicated NETDEV_FEAT_CHANGE event handling.
 >>
 >> Reported-by: syzbot+60748c96cf5c6df8e581@...kaller.appspotmail.com
 >> Fixes: fd867d51f889 ("net/core: generic support for disabling netdev 
features down stack")
 >> Signed-off-by: Taehee Yoo <ap420073@...il.com>
 >> ---
 >>   drivers/net/bonding/bond_main.c | 6 +++++-
 >>   drivers/net/team/team.c         | 6 +++++-
 >>   include/linux/netdevice.h       | 1 +
 >>   net/core/dev.c                  | 2 ++
 >>   4 files changed, 13 insertions(+), 2 deletions(-)
 >>
 >
 > Since you're syncing to lower devices, can't you check if the event 
source device
 > is lower to the current one (i.e. reverse propagation has happened) 
in the affected
 > drivers ? Adding a new struct netdevice member just for this seems 
unnecessary to me.
 > Especially for a setup like a bond of bonds or a team of teams, these 
are corner case
 > setups that shouldn't exist in general. :)
 >

I agree that this new variable is unnecessary right now.
I tried to avoid introducing new variables, but unfortunately, I 
couldn't find a solution to detect duplicated notification events.

The reason why I introduced the new member of the net_device is that I 
thought there might be similar problems in the future such as mtu.
so, I hoped that it can be used as a general variable to avoid similar 
problems.
But I really agree that this new variable is over-spec.
So, adding a new boolean variable into the struct bonding and team, not 
net_device would be reasonable if I can't find a proper solution.

Yes, the above interface graph is not a real-world case.
The purpose of the above is just to trigger stack overflow problems for 
anyone with just copy-and-paste to make it easy for testing.
It can't reproduce this problem with LRO non-support virtual interfaces 
such as dummy, VLAN, and others.
we can reproduce this problem with a team and bonding interface, so I 
used team over team as a reproducer.

I will send a v2 patch after trying to find better solution for days, 
which would not introduce the new member of net_device.
If I can't find it, v2 would introduce a new member into struct bonding 
and struct team.
Of course, any ideas are welcome!

Thank you so much!
Taehee Yoo

 > Cheers,
 >   Nik
 >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ