lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 19 Aug 2019 18:04:38 +0200
From:   Juliana Rodrigueiro <juliana.rodrigueiro@...ra2net.com>
To:     Heiner Kallweit <hkallweit1@...il.com>,
        Holger Hoffstätte <holger@...lied-asynchrony.com>,
        Eric Dumazet <eric.dumazet@...il.com>, netdev@...r.kernel.org
Subject: Re: r8169: Performance regression and latency instability

Hi!

First of all: Thank you everyone for the input.

Here is some more info about my NIC. (Using the latest ethtool)

# ethtool -i eth0 ; ifconfig eth0
driver: r8169
version:
firmware-version: rtl8168h-2_0.0.2 02/26/15
expansion-rom-version:
bus-info: 0000:02:00.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: yes
supports-priv-flags: no
eth0      Link encap:Ethernet  HWaddr <hidden>
           inet addr:<hidden>  Bcast:<hidden>  Mask:255.255.0.0
           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
           RX packets:27392501 errors:0 dropped:0 overruns:0 frame:0
           TX packets:24647212 errors:0 dropped:0 overruns:0 carrier:0
           collisions:0 txqueuelen:1000
           RX bytes:33702173568 (31.3 GiB)  TX bytes:35865124147 (33.4 GiB)


On 8/16/19 9:12 PM, Heiner Kallweit wrote:

> Indeed, here we're talking about changes in linux-next, and Juliana's issue is
> with 4.19. However I'd appreciate if Juliana could test with linux-next and
> different combinations of the NETIF_F_xxx features.

I also tested the latest linux-next (20190819) and the results did not
improved for me, unfortunately. About the same as all the kernel
versions I tested from 4.17 onwards.

# netperf -v 2 -P 0 -H <netserver-ip>,4 -I 99,5 -t omni -l 1 -- -O 
STDDEV_LATENCY -m 64K -d Send
627.99

Running linux-next I have the following defaults (shortened for simplicity):

# ethtool -k eth0
Features for eth0:
rx-checksumming: on
tx-checksumming: on
         tx-checksum-ipv4: on
         tx-checksum-ip-generic: off [fixed]
         tx-checksum-ipv6: on
         tx-checksum-fcoe-crc: off [fixed]
         tx-checksum-sctp: off [fixed]
scatter-gather: on
         tx-scatter-gather: on
         tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
         tx-tcp-segmentation: on
         tx-tcp-ecn-segmentation: off [fixed]
         tx-tcp-mangleid-segmentation: off
         tx-tcp6-segmentation: on
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
... (all off from here)


There are quite a few possible combinations to go through. I executed my
test with SG, TSO, GSO, RX, TX individually disabled, but the results
were all the same or slightly worse.

Until I find the root cause, we will have to keep the "tcp: switch to
GSO being always on" patch reverted for production, which is not ideal.

Any other ideas how I could debug this issue?


Best regards,
Juliana.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ