lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 7 Jun 2021 22:54:10 +0200
From:   Heiner Kallweit <hkallweit1@...il.com>
To:     Johannes Brandstätter <jbrandst@....eu>
Cc:     netdev@...r.kernel.org
Subject: Re: Load on RTL8168g/8111g stalls network for multiple seconds

On 07.06.2021 22:39, Heiner Kallweit wrote:
> On 07.06.2021 15:11, Johannes Brandstätter wrote:
>> Hi,
>>
>> just the other day I wanted to set up a bridge between an external 2.5G
>> RTL8156 USB Ethernet adapter (using r8152) and the built in dual
>> RTL8168g/8111g Ethernet chip (using r8169).
>> I compiled the kernel as of 5.13.0-rc4 because of the r8125 supporting
>> the RTL8156.
>> This was done using the Debian kernel config of 5.10.0-4 as a base and
>> left the rest as default.
>>
>> So this setup was working the way I wanted it to, but unfortunately
>> when running iperf3 against the machine it would rather quickly stall
>> all communications on the internal RTL8168g.
>> I was still able to communicate fine over the external RTL8156 link
>> with the machine.
>> Even without the generated network load, it would occasionally become
>> stalled.
>>
>> The only information I could really gather were that the rx_missed
>> counter was going up, and this kernel message some time after the stall
>> was happening:
>>
>> [81853.129107] r8169 0000:02:00.0 enp2s0: rtl_rxtx_empty_cond == 0
>> (loop: 42, delay: 100).
>>
>> Which has apparently to do with the wait for an empty fifo within the
>> r8169 driver.
>>
>> Until that the machine (an UP² board) using the RTL8168g ran without
>> any issues for multiple years in different configurations.
>> Only bridging immediately showed the issue when given enough network
>> load.
>>
>> After many hours of trying out different things, nothing of which
>> showed any difference whatsoever, I tried to replace the internal
>> RTL8168g with an additional external USB Ethernet adapter which I had
>> laying around, having a RTL8153 inside.
>>
>> Once the RTL8168g was removed and the RTL8153 added to the bridge, I
>> was unable to reproduce the issue.
>> Of course I'd rather like to make use of the two internal Ethernet
>> ports if I somehow can.
>>
>> So is there anything I could try to do?
>>
> Do you have flow control enabled? From 5.13-rc r8169 supports adjusting
> pause settings via ethtool. You could play with the settings to see
> whether it makes a difference.
> Next thing you could check is whether the issue persists when using
> the r8168 vendor driver.
> 
> However I'm not an expert in bridging and don't know which difference
> it could make whether a NIC is operated standalone or as part of a bridge.
> 
>> I'm eyeing with a regression test next on the kernel's r8168 driver.
>> Though this is without me knowing if there ever was a working version.
>> As this is a rather large task, with only limited time I wanted to seek
>> out some help before I go down that route.
>>
>> Maybe you could point me into the right direction, as to what to try
>> next.
>>
>> Thanks and best regards,
>> Johannes
>>
> Heiner
> 

Also something you could test, I run my interfaces with the following
settings (as replacement for traditional interrupt coalescing).

echo 20000 > /sys/class/net/enp2s0/gro_flush_timeout
echo 1 > /sys/class/net/enp2s0/napi_defer_hard_irqs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ