lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <CAJ3xEMgqmeT7tQvJZ+5Daaz2a7wsC9rUyNstP0ZoTZkr5_p1gA@mail.gmail.com> Date: Wed, 13 Dec 2017 16:28:15 +0200 From: Or Gerlitz <gerlitz.or@...il.com> To: Qing Huang <qing.huang@...cle.com> Cc: Linux Netdev List <netdev@...r.kernel.org>, Linux Kernel <linux-kernel@...r.kernel.org>, Jay Vosburgh <j.vosburgh@...il.com>, Veaceslav Falico <vfalico@...il.com>, Andy Gospodarek <andy@...yhouse.net>, Aviv Heller <avivh@...lanox.com>, Moni Shoua <monis@...lanox.com> Subject: Re: Setting large MTU size on slave interfaces may stall the whole system On Tue, Dec 12, 2017 at 5:21 AM, Qing Huang <qing.huang@...cle.com> wrote: > Hi, > > We found an issue with the bonding driver when testing Mellanox devices. > The following test commands will stall the whole system sometimes, with > serial console > flooded with log messages from the bond_miimon_inspect() function. Setting > mtu size > to be 1500 seems okay but very rarely it may hit the same problem too. > > ip address flush dev ens3f0 > ip link set dev ens3f0 down > ip address flush dev ens3f1 > ip link set dev ens3f1 down > [root@...hcl629 etc]# modprobe bonding mode=0 miimon=250 use_carrier=1 > updelay=500 downdelay=500 > [root@...hcl629 etc]# ifconfig bond0 up > [root@...hcl629 etc]# ifenslave bond0 ens3f0 ens3f1 > [root@...hcl629 etc]# ip link set bond0 mtu 4500 up > Seiral console output: > > ** 4 printk messages dropped ** [ 3717.743761] bond0: link status down for > interface ens3f0, disabling it in 500 ms [..] > It seems that when setting a large mtu size on an RoCE interface, the RTNL > mutex may be held too long by the slave > interface, causing bond_mii_monitor() to be called repeatedly at an interval > of 1 tick (1K HZ kernel configuration) and kernel to become unresponsive. Did you try/managed to reproduce that also with other NIC drivers? Or.
Powered by blists - more mailing lists