netdev - Fw: [Bug 201137] New: using traffic control with sfq cause kernel crash

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <20180915162641.00dd1050@xeon-e3>
Date:   Sat, 15 Sep 2018 16:26:41 -0700
From:   Stephen Hemminger <stephen@...workplumber.org>
To:     netdev@...r.kernel.org
Subject: Fw: [Bug 201137] New: using traffic control with sfq cause kernel
 crash



Begin forwarded message:

Date: Sat, 15 Sep 2018 08:43:09 +0000
From: bugzilla-daemon@...zilla.kernel.org
To: stephen@...workplumber.org
Subject: [Bug 201137] New: using traffic control with sfq cause kernel crash


https://bugzilla.kernel.org/show_bug.cgi?id=201137

            Bug ID: 201137
           Summary: using traffic control with sfq cause kernel crash
           Product: Networking
           Version: 2.5
    Kernel Version: 4.18.5
          Hardware: x86-64
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: IPV4
          Assignee: stephen@...workplumber.org
          Reporter: grafgrimm77@....de
        Regression: No

Created attachment 278555
  --> https://bugzilla.kernel.org/attachment.cgi?id=278555&action=edit  
kernel config

Copying from the machine to an other server (protocol does not matter), causes
a kernel crash when using tc-setting with SFQ.

The machine has a Qualcom Killer NIC: lspci |grep Killer
03:00.0 Ethernet controller: Qualcomm Atheros Killer E220x Gigabit Ethernet
Controller (rev 13)

I use traffic control with SFQ: 
tc qdisc add dev enp3s0 root handle 1: sfq
tc qdisc show dev enp3s0

Now I try to copy a big file (124GB, an image of a partition) to another
Linux-Server (same kernel version) to a NFS-Share. It does not matter if it is
a nfs or samba or whatever-share. It also does not matter if I use cp or rsync
command. 

The target-share is for example:
grep base /proc/mounts
jaguar.grafnetz:/base /mnt/base nfs4
rw,noatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.0.9,local_lock=none,addr=192.168.0.7
0 0

df shows this nfs-share called base when mounted:
jaguar.grafnetz:/base   11718572032 6012592128 5705979904   52% /mnt/base

Now I use a simpe cp-command:
cp big-fime.dd.image /mnt/base/test_01
The machine crashes after 7833735168 Bytes reached the Target-Server. About 7,9
GB (with G=1000^3). 

I can reproduce this crash. 

The good thing is: I figured out that no kernel crash happens when I do not
use:
tc qdisc add dev enp3s0 root handle 1: sfq
tc qdisc show dev enp3s0
(So I commented it out from my local start-script and rebootet the system.)
Result: No crash any more. Copying the big file (124GB) completed without a
kernel crash. 

Additional Information...

NIC is configured with IPv4:
haswell ~ # ifconfig
enp3s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.0.9  netmask 255.255.255.0  broadcast 192.168.0.255
        ether d4:3d:7e:bd:89:44  txqueuelen 1000  (Ethernet)
        RX packets 7399483  bytes 511559908 (487.8 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 91781850  bytes 47176316774 (43.9 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device interrupt 19  

ethtool enp3s0
Settings for enp3s0:
        Supported ports: [ TP ]
        Supported link modes:   10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
                                1000baseT/Full 
        Supported pause frame use: Symmetric Receive-only
        Supports auto-negotiation: Yes
        Supported FEC modes: Not reported
        Advertised link modes:  10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
                                1000baseT/Full 
        Advertised pause frame use: Symmetric
        Advertised auto-negotiation: Yes
        Advertised FEC modes: Not reported
        Speed: 1000Mb/s
        Duplex: Full
        Port: Twisted Pair
        PHYAD: 0
        Transceiver: internal
        Auto-negotiation: on
        MDI-X: Unknown
        Current message level: 0x000060e4 (24804)
                               link ifup rx_err tx_err hw wol
        Link detected: yes

While copying over the Gigabit-Network, speed is near maximum:

ifstat
      enp2s0      
 KB/s in  KB/s out
    0.06      0.18
 8348.65     31.60
117536.2    435.11
118049.0    435.04
119100.9    434.84
118889.7    435.19
119004.1    444.53
119061.4    440.47
119102.8    444.04
119077.4    444.39
119084.1    432.32
119089.6    439.71
[...]

So, perhaps the sfq-Kernel-module has a bug. I use the vanilla kernel from
kernel.org and sfq is compiled as a module. 

/usr/src/linux # grep SFQ .config
CONFIG_NET_SCH_SFQ=m

Perhaps important: the server with the target-share also uses sfq with the same
settings without a problem. It runs stable.

-- 
You are receiving this mail because:
You are the assignee for the bug.