[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1ddf7fc8-bcb3-ab48-4894-24158e8a9d0f@huawei.com>
Date: Fri, 7 Apr 2023 15:22:36 +0800
From: wangyufen <wangyufen@...wei.com>
To: Taehee Yoo <ap420073@...il.com>,
Eric Dumazet <edumazet@...gle.com>,
Dmitry Vyukov <dvyukov@...gle.com>,
Jarod Wilson <jarod@...hat.com>
CC: syzbot <syzbot+60748c96cf5c6df8e581@...kaller.appspotmail.com>,
Jiří Pírko <jiri@...nulli.us>,
<davem@...emloft.net>, <kuba@...nel.org>,
<linux-kernel@...r.kernel.org>, <netdev@...r.kernel.org>,
<pabeni@...hat.com>, <syzkaller-bugs@...glegroups.com>,
Cong Wang <xiyou.wangcong@...il.com>
Subject: Re: [syzbot] kernel panic: kernel stack overflow
在 2022/10/21 19:08, Taehee Yoo 写道:
> Hi,
>
> 2022. 10. 14. 오전 12:00에 Taehee Yoo 이(가) 쓴 글:
> > Hi,
> >
> > On 10/12/22 21:19, Eric Dumazet wrote:
> > > On Wed, Oct 12, 2022 at 12:53 AM Dmitry Vyukov <dvyukov@...gle.com>
> > wrote:
> > >>
> > >> On Wed, 12 Oct 2022 at 09:48, syzbot
> > >> <syzbot+60748c96cf5c6df8e581@...kaller.appspotmail.com> wrote:
> > >>>
> > >>> Hello,
> > >>>
> > >>> syzbot found the following issue on:
> > >>>
> > >>> HEAD commit: bbed346d5a96 Merge branch 'for-next/core' into
> > for-kernelci
> > >>> git tree:
> > git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git
> for-kernelci
> > >>> console output:
> > https://syzkaller.appspot.com/x/log.txt?x=14a03a2a880000
> > >>> kernel config:
> > https://syzkaller.appspot.com/x/.config?x=aae2d21e7dd80684
> > >>> dashboard link:
> > https://syzkaller.appspot.com/bug?extid=60748c96cf5c6df8e581
> > >>> compiler: Debian clang version
> > 13.0.1-++20220126092033+75e33f71c2da-1~exp1~20220126212112.63, GNU ld
> > (GNU Binutils for Debian) 2.35.2
> > >>> userspace arch: arm64
> > >>>
> > >>> Unfortunately, I don't have any reproducer for this issue yet.
> > >>>
> > >>> Downloadable assets:
> > >>> disk image:
> >
> https://storage.googleapis.com/syzbot-assets/11078f50b80b/disk-bbed346d.raw.xz
> >
> > >>> vmlinux:
> >
> https://storage.googleapis.com/syzbot-assets/398e5f1e6c84/vmlinux-bbed346d.xz
> >
> > >>>
> > >>> IMPORTANT: if you fix the issue, please add the following tag to
> > the commit:
> > >>> Reported-by: syzbot+60748c96cf5c6df8e581@...kaller.appspotmail.com
> > >>
> > >> +Jiri
> > >>
> > >> It looks like the issue is with the team device. It seems to call
> > >> itself infinitely.
> > >> team_device_event was mentioned in stack overflow bugs in the past:
> > >>
> >
> https://groups.google.com/g/syzkaller-bugs/search?q=%22team_device_event%22
> > >>
> > >
> > >
> > > Taehee Yoo, can you take a look ?
> > >
> > > Patch series of yours was supposed to limit max nest level to 8
> > >
> > >
> >
> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=65921376425fc9c8b7ce647e1f7989f7cdf5dd70
> >
> > >
> >
> > I found a reproducer.
> >
> > #test_team.sh
> > ip link add dummy0 type dummy
> > ip link set dummy0 up
> > for a1 in {0..1}
> > do
> > ip link add team$a1 type team
> > for a2 in {0..1}
> > do
> > ip link add team$a1$a2 master team$a1 type team
> > for a3 in {0..1}
> > do
> > ip link add team$a1$a2$a3 master team$a1$a2
> > type team
> > for a4 in {0..1}
> > do
> > ip link add team$a1$a2$a3$a4 master
> > team$a1$a2$a3 type team
> > for a5 in {0..1}
> > do
> > ip link add team$a1$a2$a3$a4$a5
> > master team$a1$a2$a3$a4 type team
> > for a6 in {0..1}
> > do
> > ip link add
> > team$a1$a2$a3$a4$a5$a6 master team$a1$a2$a3$a4$a5 type team
> > ip link add
> > macvlan$a1$a2$a3$a4$a5$a6 link dummy0 master team$a1$a2$a3$a4$a5$a6 type
> > macvlan
> > ip link set
> > macvlan$a1$a2$a3$a4$a5$a6 up
> > ip link set
> > team$a1$a2$a3$a4$a5$a6 up
> > done
> > ip link set
> team$a1$a2$a3$a4$a5 up
> > done
> > ip link set team$a1$a2$a3$a4 up
> > done
> > ip link set team$a1$a2$a3 up
> > done
> > ip link set team$a1$a2 up
> > done
> > ip link set team$a1 up
> > done
> >
> > #test_ethtool.sh
> > for a1 in {0..1}
> > do
> > ethtool -K team$a1 lro $1
> > for a2 in {0..1}
> > do
> > ethtool -K team$a1$a2 lro $1
> > for a3 in {0..1}
> > do
> > ethtool -K team$a1$a2$a3 lro $1
> > for a4 in {0..1}
> > do
> > ethtool -K team$a1$a2$a3$a4 lro $1
> > for a5 in {0..1}
> > do
> > ethtool -K team$a1$a2$a3$a4$a5
> > lro $1
> > for a6 in {0..1}
> > do
> > ethtool -K
> > team$a1$a2$a3$a4$a5$a6 lro $1
> > ethtool -K
> > macvlan$a1$a2$a3$a4$a5$a6 lro $1
> > done
> > done
> > done
> > done
> > done
> > done
> >
> > shell#1
> > bash test_team.sh
> > while :
> > do
> > bash test_ethtool.sh on
> > done
> > shell#2
> > while :
> > do
> > bash test_ethtool.sh off
> > done
> >
> > We can see a very similar call trace with the above reproducer.
> > I think it is the same issue.
> > Could you please test it?
> >
> > And, I found the fixed same issue too.
> >
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.0&id=dd912306ff008891c82cd9f63e8181e47a9cb2fb
> >
> > https://groups.google.com/g/syzkaller-bugs/c/-5OV1OW-dS4/m/o2Oq6AYSAwAJ
> >
>
> I found the root cause of this issue.
>
> This is simpler reproducer.
>
> ip link add team0 type team
> ethtool -K team0 lro on
> for i in {1..100}
> do
> ip link add team$i master team0 type team
> ethtool -K team$i lro on
> done
>
> ethtool -K team0 lro off
>
> The above graph is like below:
> team0
> |
> +------+------+-----+-----+
> | | | | |
> team1 team2 team3 ... team100
>
> int __netdev_update_features(struct net_device *dev)
> {
> struct net_device *upper, *lower;
> netdev_features_t features;
> struct list_head *iter;
> int err = -1;
> ...
> sync_lower:
> /* some features must be disabled on lower devices when disabled
> * on an upper device (think: bonding master or bridge)
> */
> netdev_for_each_lower_dev(dev, lower, iter)
> netdev_sync_lower_features(dev, lower, features);
> ...
>
>
> static void netdev_sync_lower_features(struct net_device *upper,
> struct net_device *lower, netdev_features_t features)
> {
> netdev_features_t upper_disables = NETIF_F_UPPER_DISABLES;
> netdev_features_t feature;
> int feature_bit;
>
> for_each_netdev_feature(upper_disables, feature_bit) {
> feature = __NETIF_F_BIT(feature_bit);
> if (!(features & feature) && (lower->features &
> feature)) {
> netdev_dbg(upper, "Disabling feature %pNF on
> lower dev %s.\n",
> &feature, lower->name);
> lower->wanted_features &= ~feature;
> __netdev_update_features(lower);
>
> if (unlikely(lower->features & feature))
> netdev_WARN(upper, "failed to disable
> %pNF on %s!\n",
> &feature, lower->name);
> else
> netdev_features_change(lower);<-----HERE
> }
> }
> }
>
> void netdev_features_change(struct net_device *dev)
> {
> call_netdevice_notifiers(NETDEV_FEAT_CHANGE, dev);
> }
>
> The code looks like an iterator.
> But it would work recursively because of notification.
>
> When team0's feature(LRO) is changed with <ethtool -K team0 lro off>",
> __netdev_update_features(team0) is called.
> __netdev_update_features(team0) internally sends NETDEV_FEAT_CHANGE
> event to all lower interfaces(team1, team2, ... team100).
> team1 will receive NETDEV_FEAT_CHANGE, and it sends NETDEV_FEAT_CHANGE
> to the upper interface(team0).
> team0 will receive NETDEV_FEAT_CHANGE again, and it sends
> NETDEV_FEAT_CHANGE to the all lower interfaces(team1, team2, ... team100).
> (At this point, team1 flag was already set, so it will be skipped.)
> team2 will receive NETDEV_FEAT_CHANGE, and it sends NETDEV_FEAT_CHANGE
> to the upper interface(team0).
> team0 will receive NETDEV_FEAT_CHANGE again again, and it sends
> NETDEV_FEAT_CHANGE to the all lower interfaces(team1, team2, ... team100).
> (team1, team2 skipped.)
> ...
> So, if there are a few lower interfaces(roughly under 30 lower
> interfaces), it anyway works even if internally works recursively.
> But so many lower interfaces exist, stack overflow will occur.
> This is the root cause of this issue.
>
> I think synchronization direction should be one way.
> Up or Down.
> It means that if the team0 interface can send the NETDEV_FEAT_CHANGE
> notification event to the lower interface,
> the lower interfaces should be disallowed to send NETDEV_FEAT_CHANGE
> event to the upper interface.
>
> bonding has same issue.
Excuse me, is there a fix for this issue? I had the same issue with the
5.10 version of the bonding.
Powered by blists - more mailing lists