[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <41942532-462e-fa1d-d9a4-eeb26abc481f@gmail.com>
Date: Tue, 16 May 2023 20:29:39 +0900
From: Taehee Yoo <ap420073@...il.com>
To: Paolo Abeni <pabeni@...hat.com>, Nikolay Aleksandrov
<razor@...ckwall.org>, davem@...emloft.net, kuba@...nel.org,
edumazet@...gle.com, jiri@...nulli.us, j.vosburgh@...il.com,
andy@...yhouse.net, netdev@...r.kernel.org
Cc: jarod@...hat.com, wangyufen@...wei.com,
syzbot+60748c96cf5c6df8e581@...kaller.appspotmail.com
Subject: Re: [PATCH net] net: fix stack overflow when LRO is disabled for
virtual interfaces
On 5/16/23 17:34, Paolo Abeni wrote:
Hi Paolo,
Thank you so much for the review!
> On Mon, 2023-05-15 at 18:12 +0900, Taehee Yoo wrote:
>> On 5/15/23 15:24, Nikolay Aleksandrov wrote:
>> > On 15/05/2023 08:37, Taehee Yoo wrote:
>> >> When the virtual interface's feature is updated, it
synchronizes the
>> >> updated feature for its own lower interface.
>> >> This propagation logic should be worked as the iteration, not
>> recursively.
>> >> But it works recursively due to the netdev notification
unexpectedly.
>> >> This problem occurs when it disables LRO only for the team and
bonding
>> >> interface type.
>> >>
>> >> team0
>> >> |
>> >> +------+------+-----+-----+
>> >> | | | | |
>> >> team1 team2 team3 ... team200
>> >>
>> >> If team0's LRO feature is updated, it generates the
NETDEV_FEAT_CHANGE
>> >> event to its own lower interfaces(team1 ~ team200).
>> >> It is worked by netdev_sync_lower_features().
>> >> So, the NETDEV_FEAT_CHANGE notification logic of each lower
interface
>> >> work iteratively.
>> >> But generated NETDEV_FEAT_CHANGE event is also sent to the upper
>> >> interface too.
>> >> upper interface(team0) generates the NETDEV_FEAT_CHANGE event for
>> its own
>> >> lower interfaces again.
>> >> lower and upper interfaces receive this event and generate this
>> >> event again and again.
>> >> So, the stack overflow occurs.
>> >>
>> >> But it is not the infinite loop issue.
>> >> Because the netdev_sync_lower_features() updates features before
>> >> generating the NETDEV_FEAT_CHANGE event.
>> >> Already synchronized lower interfaces skip notification logic.
>> >> So, it is just the problem that iteration logic is changed to the
>> >> recursive unexpectedly due to the notification mechanism.
>> >>
>> >> Reproducer:
>> >>
>> >> ip link add team0 type team
>> >> ethtool -K team0 lro on
>> >> for i in {1..200}
>> >> do
>> >> ip link add team$i master team0 type team
>> >> ethtool -K team$i lro on
>> >> done
>> >>
>> >> ethtool -K team0 lro off
>> >>
>> >> In order to fix it, the priv_notifier_ctx net_device member is
>> introduced.
>> >> This variable can be used by each interface in its own way in the
>> >> notification context. The bonding and team interface is going
to use it
>> >> to avoid duplicated NETDEV_FEAT_CHANGE event handling.
>> >>
>> >> Reported-by: syzbot+60748c96cf5c6df8e581@...kaller.appspotmail.com
>> >> Fixes: fd867d51f889 ("net/core: generic support for disabling
netdev
>> features down stack")
>> >> Signed-off-by: Taehee Yoo <ap420073@...il.com>
>> >> ---
>> >> drivers/net/bonding/bond_main.c | 6 +++++-
>> >> drivers/net/team/team.c | 6 +++++-
>> >> include/linux/netdevice.h | 1 +
>> >> net/core/dev.c | 2 ++
>> >> 4 files changed, 13 insertions(+), 2 deletions(-)
>> >>
>> >
>> > Since you're syncing to lower devices, can't you check if the event
>> source device
>> > is lower to the current one (i.e. reverse propagation has happened)
>> in the affected
>> > drivers ? Adding a new struct netdevice member just for this seems
>> unnecessary to me.
>> > Especially for a setup like a bond of bonds or a team of teams,
these
>> are corner case
>> > setups that shouldn't exist in general. :)
>> >
>>
>> I agree that this new variable is unnecessary right now.
>> I tried to avoid introducing new variables, but unfortunately, I
>> couldn't find a solution to detect duplicated notification events.
>>
>> The reason why I introduced the new member of the net_device is that I
>> thought there might be similar problems in the future such as mtu.
>> so, I hoped that it can be used as a general variable to avoid similar
>> problems.
>> But I really agree that this new variable is over-spec.
>> So, adding a new boolean variable into the struct bonding and team, not
>> net_device would be reasonable if I can't find a proper solution.
>
> I think adding a bool variable to bonding/team priv would be better, as
> it looks like the issues is specific to such kind of devices.
>
Thanks, I will add a bool variable to the bonding and team struct in the v2.
Thank you so much!
Taehee Yoo
> Thanks!
>
> Paolo
>
Powered by blists - more mailing lists