linux-kernel - RE: Problems with ixgbe driver

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LRH.2.02.1306170848230.28669@praktifix.dwd.de>
Date:	Mon, 17 Jun 2013 09:11:45 +0000 (GMT)
From:	Holger Kiehl <Holger.Kiehl@....de>
To:	"Tantilov, Emil S" <emil.s.tantilov@...el.com>
cc:	"e1000-devel@...ts.sf.net" <e1000-devel@...ts.sf.net>,
	linux-kernel <linux-kernel@...r.kernel.org>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: RE: Problems with ixgbe driver

Hello,

first, thank you for the quick help!

On Fri, 14 Jun 2013, Tantilov, Emil S wrote:

>> -----Original Message-----
>> From: netdev-owner@...r.kernel.org [mailto:netdev-owner@...r.kernel.org] On
>> Behalf Of Holger Kiehl
>> Sent: Friday, June 14, 2013 4:50 AM
>> To: e1000-devel@...ts.sf.net
>> Cc: linux-kernel; netdev@...r.kernel.org
>> Subject: Problems with ixgbe driver
>>
>> Hello,
>>
>> I have dual port 10Gb Intel network card on a 2 socket (Xeon X5690) with
>> a total of 12 cores. Hyperthreading is enabled so there are 24 cores.
>> The problem I have is that when other systems send large amount of data
>> the network with the intel ixgbe driver gets very slow. Ping times go up
>> from 0.2ms to appr. 60ms. Some FTP connections stall for more then 2
>> minutes. What is strange is that heatbeat is configured on the system
>> with a serial connection to another node and kernel always reports
>
> If the network slows down so much there should be some indication in dmesg. Like Tx hangs perhaps.
> Can you provide the output of dmesg and ethtool -S from the offending interface after the issue occurs?
>
No, there is absolute no indication in dmesg or /var/log/messages. But here
the ethtool output when ping times go up:

    root@...ena:~# ethtool -S eth6
    NIC statistics:
         rx_packets: 4410779
         tx_packets: 8902514
         rx_bytes: 2014041824
         tx_bytes: 13199913202
         rx_errors: 0
         tx_errors: 0
         rx_dropped: 0
         tx_dropped: 0
         multicast: 4245
         collisions: 0
         rx_over_errors: 0
         rx_crc_errors: 0
         rx_frame_errors: 0
         rx_fifo_errors: 0
         rx_missed_errors: 28143
         tx_aborted_errors: 0
         tx_carrier_errors: 0
         tx_fifo_errors: 0
         tx_heartbeat_errors: 0
         rx_pkts_nic: 2401276937
         tx_pkts_nic: 3868619482
         rx_bytes_nic: 868282794731
         tx_bytes_nic: 5743382228649
         lsc_int: 4
         tx_busy: 0
         non_eop_descs: 743957
         broadcast: 1745556
         rx_no_buffer_count: 0
         tx_timeout_count: 0
         tx_restart_queue: 425
         rx_long_length_errors: 0
         rx_short_length_errors: 0
         tx_flow_control_xon: 171
         rx_flow_control_xon: 0
         tx_flow_control_xoff: 277
         rx_flow_control_xoff: 0
         rx_csum_offload_errors: 0
         alloc_rx_page_failed: 0
         alloc_rx_buff_failed: 0
         lro_aggregated: 0
         lro_flushed: 0
         rx_no_dma_resources: 0
         hw_rsc_aggregated: 1153374
         hw_rsc_flushed: 129169
         fdir_match: 2424508153
         fdir_miss: 1706029
         fdir_overflow: 33
         os2bmc_rx_by_bmc: 0
         os2bmc_tx_by_bmc: 0
         os2bmc_tx_by_host: 0
         os2bmc_rx_by_host: 0
         tx_queue_0_packets: 470182
         tx_queue_0_bytes: 690123121
         tx_queue_1_packets: 797784
         tx_queue_1_bytes: 1203968369
         tx_queue_2_packets: 648692
         tx_queue_2_bytes: 950171718
         tx_queue_3_packets: 647434
         tx_queue_3_bytes: 948647518
         tx_queue_4_packets: 263216
         tx_queue_4_bytes: 394806409
         tx_queue_5_packets: 426786
         tx_queue_5_bytes: 629387628
         tx_queue_6_packets: 253708
         tx_queue_6_bytes: 371774276
         tx_queue_7_packets: 544634
         tx_queue_7_bytes: 812223169
         tx_queue_8_packets: 279056
         tx_queue_8_bytes: 407792510
         tx_queue_9_packets: 735792
         tx_queue_9_bytes: 1092693961
         tx_queue_10_packets: 393576
         tx_queue_10_bytes: 583283986
         tx_queue_11_packets: 712565
         tx_queue_11_bytes: 1037740789
         tx_queue_12_packets: 264445
         tx_queue_12_bytes: 386010613
         tx_queue_13_packets: 246828
         tx_queue_13_bytes: 370387352
         tx_queue_14_packets: 191789
         tx_queue_14_bytes: 281160607
         tx_queue_15_packets: 384581
         tx_queue_15_bytes: 579890782
         tx_queue_16_packets: 175119
         tx_queue_16_bytes: 261312970
         tx_queue_17_packets: 151219
         tx_queue_17_bytes: 220259675
         tx_queue_18_packets: 467746
         tx_queue_18_bytes: 707472612
         tx_queue_19_packets: 30642
         tx_queue_19_bytes: 44896997
         tx_queue_20_packets: 157957
         tx_queue_20_bytes: 238772784
         tx_queue_21_packets: 287819
         tx_queue_21_bytes: 434965075
         tx_queue_22_packets: 269298
         tx_queue_22_bytes: 407637986
         tx_queue_23_packets: 102344
         tx_queue_23_bytes: 145542751
         rx_queue_0_packets: 219438
         rx_queue_0_bytes: 273936020
         rx_queue_1_packets: 398269
         rx_queue_1_bytes: 52080243
         rx_queue_2_packets: 285870
         rx_queue_2_bytes: 102299543
         rx_queue_3_packets: 347238
         rx_queue_3_bytes: 145830086
         rx_queue_4_packets: 118448
         rx_queue_4_bytes: 17515218
         rx_queue_5_packets: 228029
         rx_queue_5_bytes: 114142681
         rx_queue_6_packets: 94285
         rx_queue_6_bytes: 107618165
         rx_queue_7_packets: 289615
         rx_queue_7_bytes: 168428647
         rx_queue_8_packets: 109288
         rx_queue_8_bytes: 35178080
         rx_queue_9_packets: 393061
         rx_queue_9_bytes: 377122152
         rx_queue_10_packets: 155004
         rx_queue_10_bytes: 66560302
         rx_queue_11_packets: 381580
         rx_queue_11_bytes: 182550920
         rx_queue_12_packets: 140681
         rx_queue_12_bytes: 44514373
         rx_queue_13_packets: 127091
         rx_queue_13_bytes: 18524907
         rx_queue_14_packets: 92548
         rx_queue_14_bytes: 34725166
         rx_queue_15_packets: 199612
         rx_queue_15_bytes: 66689821
         rx_queue_16_packets: 90018
         rx_queue_16_bytes: 29206483
         rx_queue_17_packets: 81277
         rx_queue_17_bytes: 55206035
         rx_queue_18_packets: 224446
         rx_queue_18_bytes: 14869858
         rx_queue_19_packets: 16975
         rx_queue_19_bytes: 48400959
         rx_queue_20_packets: 80806
         rx_queue_20_bytes: 5398100
         rx_queue_21_packets: 146815
         rx_queue_21_bytes: 9796087
         rx_queue_22_packets: 136018
         rx_queue_22_bytes: 9023369
         rx_queue_23_packets: 54781
         rx_queue_23_bytes: 34724433

This was with the 3.15.1 driver and setting the combinde queue to 24 via
ethtool, as you suggested below.

>>
>>     ttyS0: 4 input overrun(s)
>>
>> when lot of data is send and the ping time goes up.
>>
>> On the network there are three vlan's configured. The network is bonded
>> (active-backup) together with another HP NC523SFP 10Gb 2-port Server
>> Adapter. When I switch the network to this card the problem goes away.
>> Also the ttyS0 input overruns disappear. Note also both network cards
>> are connected to the same switch.
>>
>> The system uses Scientific Linux 6.4 with kernel.org kernel. I noticed
>> this behavior with kernel 3.9.5 and 3.9.6-rc1. Before I did not notice
>> it because traffic always went over the HP NC523SFP qlcnic card.
>>
>> In search for a solution to the problem I found a newer ixgbe driver
>> 3.15.1 (3.9.6-rc1. has 3.11.33-k) and tried that. But it has the same
>> problem. However when I load the module as follows:
>>
>>     modprobe ixgbe RSS=8,8
>>
>> the problem goes away. The kernel.org ixgbe driver does not offer this
>> option. Why? It seems that both drivers have problems on systems with
>
> If you are using newer kernel and ethtool version you can use `ethtool -L ethX combined Y` to control the number of queues per interface.
>
Okay, thank you! I did not know this.

>> 24 cpu's. But I cannot believe that I am the only one who noticed this,
>> since ixgbe is widely used.
>
> We run traffic with multiple queues all the time and I don't think what you are reporting is a generic issue. Most likely it's something related to your setup/system.
>
Yes, I think so too. But what could it be? Please, just ask what other
information I could provide. As I already mentioned earlier the ixgbe card
is bonded with a qlogic nic and I have two (not three) vlan configured over
over this bond. Maybe the following is useful (eth6 is the ixgbe driver):

    root@...ena:~# ethtool -k eth6
    Features for eth6:
    rx-checksumming: on
    tx-checksumming: on
            tx-checksum-ipv4: on
            tx-checksum-ip-generic: off [fixed]
            tx-checksum-ipv6: on
            tx-checksum-fcoe-crc: off [fixed]
            tx-checksum-sctp: on
    scatter-gather: on
            tx-scatter-gather: on
            tx-scatter-gather-fraglist: off [fixed]
    tcp-segmentation-offload: on
            tx-tcp-segmentation: on
            tx-tcp-ecn-segmentation: off [fixed]
            tx-tcp6-segmentation: on
    udp-fragmentation-offload: off [fixed]
    generic-segmentation-offload: on
    generic-receive-offload: on
    large-receive-offload: on
    rx-vlan-offload: on
    tx-vlan-offload: on
    ntuple-filters: off
    receive-hashing: on
    highdma: on [fixed]
    rx-vlan-filter: on [fixed]
    vlan-challenged: off [fixed]
    tx-lockless: off [fixed]
    netns-local: off [fixed]
    tx-gso-robust: off [fixed]
    tx-fcoe-segmentation: off [fixed]
    tx-gre-segmentation: off [fixed]
    fcoe-mtu: off [fixed]
    tx-nocache-copy: on
    loopback: off [fixed]
    rx-fcs: off [fixed]
    rx-all: off [fixed]

>>
>> It would really be nice if one could set the RSS=8,8 option for kernel.org
>> ixgbe driver too. Or if someone could tell me where I can force the driver
>> to Receive Side Scaling to 8 even if it means editing the source code.
>>
>> Below I have added some additional information. Please CC me since I
>> am not subscribed to any of these lists. And please do not hesitate
>> to ask if more information is needed.
>
> I would suggest that you open up a bug at e1000.sf.net - describe your configuration and attach the relevant info (dmesg, ethtool -S, lspci etc). This would make it easier for us to follow.
>
Sorry, but I could not find out how I can open a new bug. I could just view
existing bugs. Please give me a hint what I need to do.

Thanks,
Holger
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/