[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2ea192a8-b437-86e4-59b3-b4d57117aee1@gmail.com>
Date: Fri, 19 May 2023 15:25:12 +0900
From: Taehee Yoo <ap420073@...il.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: davem@...emloft.net, pabeni@...hat.com, edumazet@...gle.com,
jiri@...nulli.us, j.vosburgh@...il.com, andy@...yhouse.net,
netdev@...r.kernel.org, jarod@...hat.com, razor@...ckwall.org,
simon.horman@...igine.com, wangyufen@...wei.com,
syzbot+60748c96cf5c6df8e581@...kaller.appspotmail.com
Subject: Re: [PATCH net v2] net: fix stack overflow when LRO is disabled for
virtual interfaces
On 5/18/23 03:45, Jakub Kicinski wrote:
> On Thu, 18 May 2023 02:28:29 +0900 Taehee Yoo wrote:
>> > Why doesn't the (already synchronized) upper not skip the update?
>>
>> The skipping logic of this is existing in the
netdev_sync_lower_features().
>> The purpose of this is to synchronize the lower interfaces, not the
>> upper interface.
>> Actually, there is no upper-only synchronization logic.
>>
>> Both bonding and team interfaces rely on notification mechanisms to work
>> their own logic such as synchronization.
>> The notification is a broadcasting mechanism.
>> So, both lower and upper receive this event, and it works its own
>> notification handling.
>
> This is all true.
>
>> But the notification mechanism currently doesn't have options such as
>> filtering and these interfaces receive this event with updated feature
>> flags.
>
> We don't have to filter notifications.
>
>> So, the upper interface can't distinguish whether the received event is
>> the first event or duplicated event.
>
> What I was thinking was basically why does __netdev_update_features()
> not return early if it made no changes? Looking thru the history this
> behavior has been created by commit e7868a85e1b26bcb2e. Can we revert
> that and fix the problem of syncing features on new ports differently?
I think this is the best approach so I tried it with existing variables
such as dev->features, dev->wanted_features, But I couldn't find to fix it.
Because the upper interface' feature flag is updated at the end of it.
So, __netdev_update_features() is called always with the old
upper-interface' features flag.
__netdev_update_features()
netdev_sync_lower_features()
__netdev_update_features()
netdev_sync_lower_features()
...
dev->features = features;
dev->features = features;
dev->features = features;
dev->features = features;
In order to return __netdev_update_features() early in duplicated call,
__netdev_update_features() should update dev->features ealier than
netdev_sync_lower_features() call.
So, the current code doesn't update dev->features early so it can't
check duplicated calls with dev->features.
I tested this approach with a revert of e7868a85e1b26bcb2e and the below
change.
diff --git a/net/core/dev.c b/net/core/dev.c
index 6b12d8a9d463..f051c293ffaa 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -9758,6 +9758,9 @@ int __netdev_update_features(struct net_device *dev)
return -1;
}
+ if (netif_is_bond_master(dev) || netif_is_team_master(dev))
+ dev->features = features;
+
/* some features must be disabled on lower devices when disabled
* on an upper device (think: bonding master or bridge)
*/
It fixes the stack overflow problem, but I'm not sure whether updating
it before netdev_sync_lower_features() is safe or not.
Powered by blists - more mailing lists