lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 7 Apr 2023 15:22:36 +0800
From:   wangyufen <wangyufen@...wei.com>
To:     Taehee Yoo <ap420073@...il.com>,
        Eric Dumazet <edumazet@...gle.com>,
        Dmitry Vyukov <dvyukov@...gle.com>,
        Jarod Wilson <jarod@...hat.com>
CC:     syzbot <syzbot+60748c96cf5c6df8e581@...kaller.appspotmail.com>,
        Jiří Pírko <jiri@...nulli.us>,
        <davem@...emloft.net>, <kuba@...nel.org>,
        <linux-kernel@...r.kernel.org>, <netdev@...r.kernel.org>,
        <pabeni@...hat.com>, <syzkaller-bugs@...glegroups.com>,
        Cong Wang <xiyou.wangcong@...il.com>
Subject: Re: [syzbot] kernel panic: kernel stack overflow



在 2022/10/21 19:08, Taehee Yoo 写道:
> Hi,
> 
> 2022. 10. 14. 오전 12:00에 Taehee Yoo 이(가) 쓴 글:
>  > Hi,
>  >
>  > On 10/12/22 21:19, Eric Dumazet wrote:
>  >  > On Wed, Oct 12, 2022 at 12:53 AM Dmitry Vyukov <dvyukov@...gle.com>
>  > wrote:
>  >  >>
>  >  >> On Wed, 12 Oct 2022 at 09:48, syzbot
>  >  >> <syzbot+60748c96cf5c6df8e581@...kaller.appspotmail.com> wrote:
>  >  >>>
>  >  >>> Hello,
>  >  >>>
>  >  >>> syzbot found the following issue on:
>  >  >>>
>  >  >>> HEAD commit:    bbed346d5a96 Merge branch 'for-next/core' into
>  > for-kernelci
>  >  >>> git tree:
>  > git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git 
> for-kernelci
>  >  >>> console output:
>  > https://syzkaller.appspot.com/x/log.txt?x=14a03a2a880000
>  >  >>> kernel config:
>  > https://syzkaller.appspot.com/x/.config?x=aae2d21e7dd80684
>  >  >>> dashboard link:
>  > https://syzkaller.appspot.com/bug?extid=60748c96cf5c6df8e581
>  >  >>> compiler:       Debian clang version
>  > 13.0.1-++20220126092033+75e33f71c2da-1~exp1~20220126212112.63, GNU ld
>  > (GNU Binutils for Debian) 2.35.2
>  >  >>> userspace arch: arm64
>  >  >>>
>  >  >>> Unfortunately, I don't have any reproducer for this issue yet.
>  >  >>>
>  >  >>> Downloadable assets:
>  >  >>> disk image:
>  > 
> https://storage.googleapis.com/syzbot-assets/11078f50b80b/disk-bbed346d.raw.xz
>  >
>  >  >>> vmlinux:
>  > 
> https://storage.googleapis.com/syzbot-assets/398e5f1e6c84/vmlinux-bbed346d.xz
>  >
>  >  >>>
>  >  >>> IMPORTANT: if you fix the issue, please add the following tag to
>  > the commit:
>  >  >>> Reported-by: syzbot+60748c96cf5c6df8e581@...kaller.appspotmail.com
>  >  >>
>  >  >> +Jiri
>  >  >>
>  >  >> It looks like the issue is with the team device. It seems to call
>  >  >> itself infinitely.
>  >  >> team_device_event was mentioned in stack overflow bugs in the past:
>  >  >>
>  > 
> https://groups.google.com/g/syzkaller-bugs/search?q=%22team_device_event%22
>  >  >>
>  >  >
>  >  >
>  >  > Taehee Yoo, can you take a look ?
>  >  >
>  >  > Patch series of yours was supposed to limit max nest level to 8
>  >  >
>  >  >
>  > 
> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=65921376425fc9c8b7ce647e1f7989f7cdf5dd70
>  >
>  >  >
>  >
>  > I found a reproducer.
>  >
>  > #test_team.sh
>  > ip link add dummy0 type dummy
>  > ip link set dummy0 up
>  > for a1 in {0..1}
>  > do
>  >          ip link add team$a1 type team
>  >          for a2 in {0..1}
>  >          do
>  >                  ip link add team$a1$a2 master team$a1 type team
>  >                  for a3 in {0..1}
>  >                  do
>  >                          ip link add team$a1$a2$a3 master team$a1$a2
>  > type team
>  >                          for a4 in {0..1}
>  >                          do
>  >                                  ip link add team$a1$a2$a3$a4 master
>  > team$a1$a2$a3 type team
>  >                                  for a5 in {0..1}
>  >                                  do
>  >                                          ip link add team$a1$a2$a3$a4$a5
>  > master team$a1$a2$a3$a4 type team
>  >                                          for a6 in {0..1}
>  >                                          do
>  >                                                  ip link add
>  > team$a1$a2$a3$a4$a5$a6 master team$a1$a2$a3$a4$a5 type team
>  >                                                  ip link add
>  > macvlan$a1$a2$a3$a4$a5$a6 link dummy0 master team$a1$a2$a3$a4$a5$a6 type
>  > macvlan
>  >                                                  ip link set
>  > macvlan$a1$a2$a3$a4$a5$a6 up
>  >                                                  ip link set
>  > team$a1$a2$a3$a4$a5$a6 up
>  >                                          done
>  >                                          ip link set 
> team$a1$a2$a3$a4$a5 up
>  >                                  done
>  >                                  ip link set team$a1$a2$a3$a4 up
>  >                          done
>  >                          ip link set team$a1$a2$a3 up
>  >                  done
>  >                  ip link set team$a1$a2 up
>  >          done
>  >          ip link set team$a1 up
>  > done
>  >
>  > #test_ethtool.sh
>  > for a1 in {0..1}
>  > do
>  >          ethtool -K team$a1 lro $1
>  >          for a2 in {0..1}
>  >          do
>  >                  ethtool -K team$a1$a2 lro $1
>  >                  for a3 in {0..1}
>  >                  do
>  >                          ethtool -K team$a1$a2$a3 lro $1
>  >                          for a4 in {0..1}
>  >                          do
>  >                                  ethtool -K team$a1$a2$a3$a4 lro $1
>  >                                  for a5 in {0..1}
>  >                                  do
>  >                                          ethtool -K team$a1$a2$a3$a4$a5
>  > lro $1
>  >                                          for a6 in {0..1}
>  >                                          do
>  >                                                  ethtool -K
>  > team$a1$a2$a3$a4$a5$a6 lro $1
>  >                                                  ethtool -K
>  > macvlan$a1$a2$a3$a4$a5$a6 lro $1
>  >                                          done
>  >                                  done
>  >                          done
>  >                  done
>  >          done
>  > done
>  >
>  > shell#1
>  > bash test_team.sh
>  > while :
>  > do
>  > bash test_ethtool.sh on
>  > done
>  > shell#2
>  > while :
>  > do
>  > bash test_ethtool.sh off
>  > done
>  >
>  > We can see a very similar call trace with the above reproducer.
>  > I think it is the same issue.
>  > Could you please test it?
>  >
>  > And, I found the fixed same issue too.
>  > 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.0&id=dd912306ff008891c82cd9f63e8181e47a9cb2fb
>  >
>  > https://groups.google.com/g/syzkaller-bugs/c/-5OV1OW-dS4/m/o2Oq6AYSAwAJ
>  >
> 
> I found the root cause of this issue.
> 
> This is simpler reproducer.
> 
> ip link add team0 type team
> ethtool -K team0 lro on
> for i in {1..100}
> do
>          ip link add team$i master team0 type team
>          ethtool -K team$i lro on
> done
> 
> ethtool -K team0 lro off
> 
> The above graph is like below:
>         team0
>           |
>    +------+------+-----+-----+
>    |      |      |     |     |
> team1  team2  team3  ...  team100
> 
> int __netdev_update_features(struct net_device *dev)
> {
>          struct net_device *upper, *lower;
>          netdev_features_t features;
>          struct list_head *iter;
>          int err = -1;
> ...
> sync_lower:
>          /* some features must be disabled on lower devices when disabled
>           * on an upper device (think: bonding master or bridge)
>           */
>          netdev_for_each_lower_dev(dev, lower, iter)
>                  netdev_sync_lower_features(dev, lower, features);
> ...
> 
> 
> static void netdev_sync_lower_features(struct net_device *upper,
>          struct net_device *lower, netdev_features_t features)
> {
>          netdev_features_t upper_disables = NETIF_F_UPPER_DISABLES;
>          netdev_features_t feature;
>          int feature_bit;
> 
>          for_each_netdev_feature(upper_disables, feature_bit) {
>                  feature = __NETIF_F_BIT(feature_bit);
>                  if (!(features & feature) && (lower->features & 
> feature)) {
>                          netdev_dbg(upper, "Disabling feature %pNF on 
> lower dev %s.\n",
>                                     &feature, lower->name);
>                          lower->wanted_features &= ~feature;
>                          __netdev_update_features(lower);
> 
>                          if (unlikely(lower->features & feature))
>                                  netdev_WARN(upper, "failed to disable 
> %pNF on %s!\n",
>                                              &feature, lower->name);
>                          else
>                                  netdev_features_change(lower);<-----HERE
>                  }
>          }
> }
> 
> void netdev_features_change(struct net_device *dev)
> {
>          call_netdevice_notifiers(NETDEV_FEAT_CHANGE, dev);
> }
> 
> The code looks like an iterator.
> But it would work recursively because of notification.
> 
> When team0's feature(LRO) is changed with <ethtool -K team0 lro off>", 
> __netdev_update_features(team0) is called.
> __netdev_update_features(team0) internally sends NETDEV_FEAT_CHANGE 
> event to all lower interfaces(team1, team2, ... team100).
> team1 will receive NETDEV_FEAT_CHANGE, and it sends NETDEV_FEAT_CHANGE 
> to the upper interface(team0).
> team0 will receive NETDEV_FEAT_CHANGE again, and it sends 
> NETDEV_FEAT_CHANGE to the all lower interfaces(team1, team2, ... team100).
> (At this point, team1 flag was already set, so it will be skipped.)
> team2 will receive NETDEV_FEAT_CHANGE, and it sends NETDEV_FEAT_CHANGE 
> to the upper interface(team0).
> team0 will receive NETDEV_FEAT_CHANGE again again, and it sends 
> NETDEV_FEAT_CHANGE to the all lower interfaces(team1, team2, ... team100).
> (team1, team2 skipped.)
> ...
> So, if there are a few lower interfaces(roughly under 30 lower 
> interfaces), it anyway works even if internally works recursively.
> But so many lower interfaces exist, stack overflow will occur.
> This is the root cause of this issue.
> 
> I think synchronization direction should be one way.
> Up or Down.
> It means that if the team0 interface can send the NETDEV_FEAT_CHANGE 
> notification event to the lower interface,
> the lower interfaces should be disallowed to send NETDEV_FEAT_CHANGE 
> event to the upper interface.
> 
> bonding has same issue.

Excuse me, is there a fix for this issue? I had the same issue with the 
5.10 version of the bonding.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ