[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20180915162641.00dd1050@xeon-e3>
Date: Sat, 15 Sep 2018 16:26:41 -0700
From: Stephen Hemminger <stephen@...workplumber.org>
To: netdev@...r.kernel.org
Subject: Fw: [Bug 201137] New: using traffic control with sfq cause kernel
crash
Begin forwarded message:
Date: Sat, 15 Sep 2018 08:43:09 +0000
From: bugzilla-daemon@...zilla.kernel.org
To: stephen@...workplumber.org
Subject: [Bug 201137] New: using traffic control with sfq cause kernel crash
https://bugzilla.kernel.org/show_bug.cgi?id=201137
Bug ID: 201137
Summary: using traffic control with sfq cause kernel crash
Product: Networking
Version: 2.5
Kernel Version: 4.18.5
Hardware: x86-64
OS: Linux
Tree: Mainline
Status: NEW
Severity: normal
Priority: P1
Component: IPV4
Assignee: stephen@...workplumber.org
Reporter: grafgrimm77@....de
Regression: No
Created attachment 278555
--> https://bugzilla.kernel.org/attachment.cgi?id=278555&action=edit
kernel config
Copying from the machine to an other server (protocol does not matter), causes
a kernel crash when using tc-setting with SFQ.
The machine has a Qualcom Killer NIC: lspci |grep Killer
03:00.0 Ethernet controller: Qualcomm Atheros Killer E220x Gigabit Ethernet
Controller (rev 13)
I use traffic control with SFQ:
tc qdisc add dev enp3s0 root handle 1: sfq
tc qdisc show dev enp3s0
Now I try to copy a big file (124GB, an image of a partition) to another
Linux-Server (same kernel version) to a NFS-Share. It does not matter if it is
a nfs or samba or whatever-share. It also does not matter if I use cp or rsync
command.
The target-share is for example:
grep base /proc/mounts
jaguar.grafnetz:/base /mnt/base nfs4
rw,noatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.0.9,local_lock=none,addr=192.168.0.7
0 0
df shows this nfs-share called base when mounted:
jaguar.grafnetz:/base 11718572032 6012592128 5705979904 52% /mnt/base
Now I use a simpe cp-command:
cp big-fime.dd.image /mnt/base/test_01
The machine crashes after 7833735168 Bytes reached the Target-Server. About 7,9
GB (with G=1000^3).
I can reproduce this crash.
The good thing is: I figured out that no kernel crash happens when I do not
use:
tc qdisc add dev enp3s0 root handle 1: sfq
tc qdisc show dev enp3s0
(So I commented it out from my local start-script and rebootet the system.)
Result: No crash any more. Copying the big file (124GB) completed without a
kernel crash.
Additional Information...
NIC is configured with IPv4:
haswell ~ # ifconfig
enp3s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.0.9 netmask 255.255.255.0 broadcast 192.168.0.255
ether d4:3d:7e:bd:89:44 txqueuelen 1000 (Ethernet)
RX packets 7399483 bytes 511559908 (487.8 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 91781850 bytes 47176316774 (43.9 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
device interrupt 19
ethtool enp3s0
Settings for enp3s0:
Supported ports: [ TP ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Supported pause frame use: Symmetric Receive-only
Supports auto-negotiation: Yes
Supported FEC modes: Not reported
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Advertised pause frame use: Symmetric
Advertised auto-negotiation: Yes
Advertised FEC modes: Not reported
Speed: 1000Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 0
Transceiver: internal
Auto-negotiation: on
MDI-X: Unknown
Current message level: 0x000060e4 (24804)
link ifup rx_err tx_err hw wol
Link detected: yes
While copying over the Gigabit-Network, speed is near maximum:
ifstat
enp2s0
KB/s in KB/s out
0.06 0.18
8348.65 31.60
117536.2 435.11
118049.0 435.04
119100.9 434.84
118889.7 435.19
119004.1 444.53
119061.4 440.47
119102.8 444.04
119077.4 444.39
119084.1 432.32
119089.6 439.71
[...]
So, perhaps the sfq-Kernel-module has a bug. I use the vanilla kernel from
kernel.org and sfq is compiled as a module.
/usr/src/linux # grep SFQ .config
CONFIG_NET_SCH_SFQ=m
Perhaps important: the server with the target-share also uses sfq with the same
settings without a problem. It runs stable.
--
You are receiving this mail because:
You are the assignee for the bug.
Powered by blists - more mailing lists