netdev - ~60k interrupts/sec for 1Gb/s iperf with r8169

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <4B5447E3.5090405@zappa.cx>
Date:	Mon, 18 Jan 2010 12:37:07 +0100
From:	Andreas Sundstrom <sunkan@...pa.cx>
To:	netdev@...r.kernel.org
Subject: ~60k interrupts/sec for 1Gb/s iperf with r8169 

Hi, first of all I want to let you know that I'm not a coder, and
very much not a kernel hacker.

But I have seen a (strange for me) behavior.

Let me explain what I have seen and what I have tried to narrow
down the problem.

The "problem" is that I see that when transferring large amounts
of data many IRQs are being generated. This could be expected
behavior, and then you would not need to care much about my report
but I think that something is wrong.

Because the card is default set to have "rx checksumming" enabled
but if I re-enable it the IRQs for the same transfer goes down to
~9000 interrupts/sec. Which seems to be more like normal rate when
comparing to other NICs (eg. e1000).

I guess that there is some smart function in the NIC to not generate
interrupts all the time when it is delivering packets at a high rate?


Here is the relevant dmesg output for my NIC. It's an integrated
NIC on my ASUS M2A-VM HDMI mb.

r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
r8169 0000:02:00.0: PCI INT A -> GSI 19 (level, low) -> IRQ 19
r8169 0000:02:00.0: setting latency timer to 64
r8169 0000:02:00.0: irq 26 for MSI/MSI-X
eth0: RTL8168b/8111b at 0xffffc90010944000, 00:1f:c6:b3:ff:eb, XID 18000000 IRQ 26
r8169: eth0: chipset (mac_version = 12).

I added the output of the mac_version when I was troubleshooting.


This is after a fresh start:

sunkan@...kan:~$ sudo ethtool -k eth0
Offload parameters for eth0:
rx-checksumming: on
tx-checksumming: off
scatter-gather: off
tcp segmentation offload: off
udp fragmentation offload: off
generic segmentation offload: off
large receive offload: off
sunkan@...kan:~$

Note that rx-checksumming is on..
Then I run the iperf and look at the vmstat output like so:

sunkan@...kan:~$ iperf -c host2
------------------------------------------------------------
Client connecting to host2, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  3] local 192.168.20.1 port 52496 connected with 192.168.20.30 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  1.10 GBytes    942 Mbits/sec

sunkan@...kan:~$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0      0 817308 205932 2144676    0    0     8     9   78   76  2  2 96  0
 0  0      0 817300 205932 2144676    0    0     0     0  716  913  1  0 98  0
 0  0      0 817424 205932 2144676    0    0     0     0  684  928  1  0 99  0
 0  0      0 817432 205932 2144676    0    0     0     0  764 1021  1  0 98  0
 3  0      0 816888 205940 2144672    0    0     0    16 4437 1234  1  1 97  0
 0  0      0 814632 205940 2144784    0    0   108    20 58918 2159  2 12 85  0
 0  0      0 814476 205940 2144784    0    0     0     0 59634 1787  1 13 86  0
 0  0      0 814436 205940 2144784    0    0     0     0 59496 1843  2 12 86  0
 2  0      0 814676 205940 2144784    0    0     0     0 59657 1745  1 12 87  0
 1  0      0 814756 205940 2144784    0    0     0     0 59790 1808  1 13 86  0
 0  0      0 814540 205948 2144784    0    0     0    16 59564 1797  1 12 87  0
 0  0      0 814540 205948 2144784    0    0     0     0 60349 3695  1 14 84  0
 0  0      0 814476 205948 2144784    0    0     0     0 59649 1699  1 12 87  0
 1  0      0 814580 205948 2144784    0    0     0     0 59449 1664  1 12 87  0
 1  0      0 816988 205948 2144784    0    0     0     0 54518 1635  1 11 88  0
 0  0      0 817236 205948 2144784    0    0     0     0  790 1056  2  1 98  0
 1  0      0 817236 205948 2144784    0    0     0     0  714  975  1  0 99  0
^C

Note that "in" is ~60k

If I then re-enable rx-checksumming like so:
sunkan@...kan:~$ sudo ethtool -K eth0 rx on

And run the test again:
sunkan@...kan:~$ iperf -c host2
------------------------------------------------------------
Client connecting to host2, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  3] local 192.168.20.1 port 56916 connected with 192.168.20.30 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  1.10 GBytes    942 Mbits/sec
sunkan@...kan:~$


sunkan@...kan:~$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 2  0      0 816788 206124 2145004    0    0     0     0  742  982  1  0 98  0
 0  0      0 816676 206124 2145004    0    0     0     0  727  984  1  0 98  0
 1  0      0 816788 206128 2145004    0    0     0    28  689  915  1  0 99  0
 0  0      0 816788 206128 2145004    0    0     0     0  798 1080  2  0 98  0
 0  0      0 816788 206128 2145004    0    0     0     8  767 1008  1  1 98  0
 0  0      0 816788 206128 2145004    0    0     0     0  681  952  1  0 99  0
 3  0      0 816788 206128 2145004    0    0     0     0  952 1235  4  1 95  0
 0  0      0 816804 206128 2145008    0    0     0     0 2595 3978 12  3 86  0
 0  0      0 816788 206128 2145036    0    0     0     0  860 1131  2  0 98  0
 2  0      0 814572 206136 2145032    0    0     0    48 5413 2179  2  5 93  0
 0  0      0 814388 206144 2145032    0    0     0    40 7587 2219  2 17 81  1
 2  0      0 814676 206144 2145036    0    0     0     0 10459 3345  6 13 81  0
 1  0      0 814396 206144 2145036    0    0     0     0 9396 2283  2 12 86  0
 0  0      0 814108 206144 2145036    0    0     0     0 9424 1852  1 10 89  0
 2  0      0 814516 206144 2145036    0    0     0     0 9150 1934  2 10 88  0
 0  0      0 814548 206144 2145036    0    0     0     0 8984 1986  2 12 86  0
 1  0      0 814092 206144 2145036    0    0     0     0 9425 2053  1 12 88  0
 0  0      0 814476 206144 2145036    0    0     0     0 8195 2090  1 15 84  0
 0  0      0 814508 206144 2145036    0    0     0     0 8759 2034  1 12 87  0
 0  0      0 816744 206144 2145036    0    0     0     0 4813 1435  1  5 94  0
 0  0      0 816800 206144 2145036    0    0     0     0  771 1049  1  0 98  0
^C

Here you can se in the latter part that "in" is registering ~9000
interrupts/sec. This seems more "normal" to me (comparing to other NICs)


Originally I was playing with jumbo frames and stumbled upon this at that
time, I have issues with jumbo frames too but I will write another email
about that.

What I noticed was that changing the mtu made the interrupts go back to
the state it was when the machine was freshly booted.

sunkan@...kan:~$ sudo ip link set eth0 mtu 1501
sunkan@...kan:~$ sudo ip link set eth0 mtu 1500
sunkan@...kan:~$ iperf -c host2
------------------------------------------------------------
Client connecting to host2, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  3] local 192.168.20.1 port 34021 connected with 192.168.20.30 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  1.10 GBytes    942 Mbits/sec
sunkan@...kan:~$
sunkan@...kan:~$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0      0 815740 206616 2145072    0    0     8     9   83   78  2  2 96  0
 0  0      0 815732 206616 2145072    0    0     0     0  718  935  2  1 98  0
 0  0      0 815732 206616 2145072    0    0     0     0  638  888  1  0 98  0
 0  0      0 815732 206616 2145072    0    0     0     0  803 1108  1  0 99  0
 3  0      0 813144 206616 2145072    0    0     0     0 40334 2159  1  7 91  0
 1  0      0 813072 206620 2145072    0    0     0    16 59725 1901  1 13 86  0
 1  0      0 813152 206620 2145072    0    0     0    56 59505 1844  2 12 86  0
 2  0      0 813000 206620 2145072    0    0     0     0 59871 1848  1 12 86  0
 0  0      0 813000 206620 2145072    0    0     0     0 59169 1773  1 11 88  0
 3  0      0 812748 206620 2145072    0    0     0     0 59157 1848  3 11 86  0
 1  0      0 812804 206628 2145072    0    0     0    16 59805 3600  1 12 87  0
 2  0      0 812764 206632 2145068    0    0     0    20 59740 1734  1 12 86  1
 1  0      0 812660 206632 2145072    0    0     0     0 59694 1836  1 13 86  0
 2  0      0 812660 206632 2145072    0    0     0     0 59521 1901  1 13 86  0
 0  0      0 815380 206632 2145072    0    0     0     0 19719 1250  1  4 94  0
 0  0      0 815412 206632 2145072    0    0     0     0  731  984  1  0 98  0
 0  0      0 815412 206632 2145072    0    0     0     0  675  928  1  0 98  0
 0  0      0 815412 206640 2145064    0    0     0    16  739 1004  1  0 98  0
^C
sunkan@...kan:~$

~60k interrupts/sec again..

So looking in the rtl8169_change_mtu function I noticed that the NIC
is being brought down and up again for a brief period.
I looked in the rtl8169_set_rx_csum function for something that could
be "fixing" the IRQ issue, and it seems it's this command.
"RTL_W16(CPlusCmd, tp->cp_cmd);"

Now, I have really no clue what that actually does. But I tried
to add it in change_mtu function to see how it changed the behaviour.
Adding it after "rtl_hw_start(dev);" has the effect that a change
in mtu also fixen the interrupts (like re-enabling rx-checksumming)

        //RTL_W16(CPlusCmd, tp->cp_cmd); // This does not work

        rtl_hw_start(dev);

        RTL_W16(CPlusCmd, tp->cp_cmd); // This works

        rtl8169_request_timer(dev);

So then I went to the rtl_hw_start function and added it in a few places
there to see where it would have effect. And now I narrowed it down to
"tp->hw_start(dev);". If I add it after that it is generating ~9000
interrupts/sec during iperf tests even after a fresh boot.

I tried to go further but since I don't really know what I'm doing I
thought that it might be better to report my findings and see if someone
can help with finding a proper solution.
Or maybe an explanation of the behavior if it's expected.


Long email so I'll stop here, let me know if I can provide more info.

P.S. I'm currently not subscribed so please CC me on any posts regarding
this.

/Andreas Sundstrom
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html