[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <68c989ff-835a-d9fb-e95a-9536dab1e341@gmail.com>
Date: Thu, 31 Jan 2019 07:49:41 +0100
From: Heiner Kallweit <hkallweit1@...il.com>
To: David Chang <dchang@...e.com>
Cc: Peter Ceiley <peter@...ley.net>,
Realtek linux nic maintainers <nic_swsd@...ltek.com>,
netdev@...r.kernel.org
Subject: Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
And one more inquiry ..
So far I read about the issue only in combination with NFS.
Does the issue also occur with iperf or some other type of
high network load?
Heiner
On 31.01.2019 07:35, Heiner Kallweit wrote:
> Hi David, two more things:
>
> 1. Could you please test a recent linux-next kernel?
> 2. Please get a register dump (ethtool -d <if>) from 4.18 and 4.19
> and compare them.
>
> Heiner
>
>
> On 31.01.2019 07:21, Heiner Kallweit wrote:
>> David, thanks for the link to the bug ticket.
>> I think only a proper bisect can help to find the offending commit.
>>
>> Heiner
>>
>>
>> On 31.01.2019 03:32, David Chang wrote:
>>> Hi,
>>>
>>> We had a similr case here.
>>> - Realtek r8169 receive performance regression in kernel 4.19
>>> https://bugzilla.suse.com/show_bug.cgi?id=1119649
>>>
>>> kernel: r8169 0000:01:00.0 eth0: RTL8168h/8111h, XID 54100880
>>> The major symptom is there are many rx_missed count.
>>>
>>>
>>> On Jan 30, 2019 at 20:15:45 +0100, Heiner Kallweit wrote:
>>>> Hi Peter,
>>>>
>>>> recently I had somebody where pcie_aspm=off for whatever reason didn't
>>>> do the trick, can you also check with pcie_aspm.policy=performance.
>>>
>>> We will give it a try later.
>>>
>>>> And please check with "ethtool -S <if>" whether the chip statistics
>>>> show a significant number of errors.
>>>>
>>>> If this doesn't help you may have to bisect to find the offending commit.
>>>
>>> We had tried fallback driver to a few previous commits as following,
>>> but with no luck.
>>>
>>> 9675931e6b65 r8169: re-enable MSI-X on RTL8168g (v4.19)
>>> 098b01ad9837 r8169: don't include asm headers directly (v4.19-rc1)
>>> a2965f12fde6 r8169: remove rtl8169_set_speed_xmii (v4.19-rc1)
>>> 6fcf9b1d4d6c r8169: fix runtime suspend (v4.19-rc1)
>>> e397286b8e89 r8169: remove TBI 1000BaseX support (v4.19-rc1)
>>>
>>> Thanks,
>>> David Chang
>>>
>>>>
>>>> Heiner
>>>>
>>>>
>>>> On 30.01.2019 10:59, Peter Ceiley wrote:
>>>>> Hi Heiner,
>>>>>
>>>>> I tried disabling the ASPM using the pcie_aspm=off kernel parameter
>>>>> and this made no difference.
>>>>>
>>>>> I tried compiling the 4.18.16 r8169.c with the 4.19.18 source and
>>>>> subsequently loaded the module in the running 4.19.18 kernel. I can
>>>>> confirm that this immediately resolved the issue and access to the NFS
>>>>> shares operated as expected.
>>>>>
>>>>> I presume this means it is an issue with the r8169 driver included in
>>>>> 4.19 onwards?
>>>>>
>>>>> To answer your last questions:
>>>>>
>>>>> Base Board Information
>>>>> Manufacturer: Alienware
>>>>> Product Name: 0PGRP5
>>>>> Version: A02
>>>>>
>>>>> ... and yes, the RTL8168 is the onboard network chip.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Peter.
>>>>>
>>>>> On Tue, 29 Jan 2019 at 17:44, Heiner Kallweit <hkallweit1@...il.com> wrote:
>>>>>>
>>>>>> Hi Peter,
>>>>>>
>>>>>> I think the vendor driver doesn't enable ASPM per default.
>>>>>> So it's worth a try to disable ASPM in the BIOS or via sysfs.
>>>>>> Few older systems seem to have issues with ASPM, what kind of
>>>>>> system / mainboard are you using? The RTL8168 is the onboard
>>>>>> network chip?
>>>>>>
>>>>>> Rgds, Heiner
>>>>>>
>>>>>>
>>>>>> On 29.01.2019 07:20, Peter Ceiley wrote:
>>>>>>> Hi Heiner,
>>>>>>>
>>>>>>> Thanks, I'll do some more testing. It might not be the driver - I
>>>>>>> assumed it was due to the fact that using the r8168 driver 'resolves'
>>>>>>> the issue. I'll see if I can test the r8169.c on top of 4.19 - this is
>>>>>>> a good idea.
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> Peter.
>>>>>>>
>>>>>>> On Tue, 29 Jan 2019 at 17:16, Heiner Kallweit <hkallweit1@...il.com> wrote:
>>>>>>>>
>>>>>>>> Hi Peter,
>>>>>>>>
>>>>>>>> at a first glance it doesn't look like a typical driver issue.
>>>>>>>> What you could do:
>>>>>>>>
>>>>>>>> - Test the r8169.c from 4.18 on top of 4.19.
>>>>>>>>
>>>>>>>> - Check whether disabling ASPM (/sys/module/pcie_aspm) has an effect.
>>>>>>>>
>>>>>>>> - Bisect between 4.18 and 4.19 to find the offending commit.
>>>>>>>>
>>>>>>>> Any specific reason why you think root cause is in the driver and not
>>>>>>>> elsewhere in the network subsystem?
>>>>>>>>
>>>>>>>> Heiner
>>>>>>>>
>>>>>>>>
>>>>>>>> On 28.01.2019 23:10, Peter Ceiley wrote:
>>>>>>>>> Hi Heiner,
>>>>>>>>>
>>>>>>>>> Thanks for getting back to me.
>>>>>>>>>
>>>>>>>>> No, I don't use jumbo packets.
>>>>>>>>>
>>>>>>>>> Bandwidth is *generally* good, and iperf results to my NAS provide
>>>>>>>>> over 900 Mbits/s in both circumstances. The issue seems to appear when
>>>>>>>>> establishing a connection and is most notable, for example, on my
>>>>>>>>> mounted NFS shares where it takes seconds (up to 10's of seconds on
>>>>>>>>> larger directories) to list the contents of each directory. Once a
>>>>>>>>> transfer begins on a file, I appear to get good bandwidth.
>>>>>>>>>
>>>>>>>>> I'm unsure of the best scientific data to provide you in order to
>>>>>>>>> troubleshoot this issue. Running the following
>>>>>>>>>
>>>>>>>>> netstat -s |grep retransmitted
>>>>>>>>>
>>>>>>>>> shows a steady increase in retransmitted segments each time I list the
>>>>>>>>> contents of a remote directory, for example, running 'ls' on a
>>>>>>>>> directory containing 345 media files did the following using kernel
>>>>>>>>> 4.19.18:
>>>>>>>>>
>>>>>>>>> increased retransmitted segments by 21 and the 'time' command showed
>>>>>>>>> the following:
>>>>>>>>> real 0m19.867s
>>>>>>>>> user 0m0.012s
>>>>>>>>> sys 0m0.036s
>>>>>>>>>
>>>>>>>>> The same command shows no retransmitted segments running kernel
>>>>>>>>> 4.18.16 and 'time' showed:
>>>>>>>>> real 0m0.300s
>>>>>>>>> user 0m0.004s
>>>>>>>>> sys 0m0.007s
>>>>>>>>>
>>>>>>>>> ifconfig does not show any RX/TX errors nor dropped packets in either case.
>>>>>>>>>
>>>>>>>>> dmesg XID:
>>>>>>>>> [ 2.979984] r8169 0000:03:00.0 eth0: RTL8168g/8111g,
>>>>>>>>> f8:b1:56:fe:67:e0, XID 4c000800, IRQ 32
>>>>>>>>>
>>>>>>>>> # lspci -vv
>>>>>>>>> 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
>>>>>>>>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
>>>>>>>>> Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
>>>>>>>>> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
>>>>>>>>> ParErr- Stepping- SERR- FastB2B- DisINTx+
>>>>>>>>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
>>>>>>>>> <TAbort- <MAbort- >SERR- <PERR- INTx-
>>>>>>>>> Latency: 0, Cache Line Size: 64 bytes
>>>>>>>>> Interrupt: pin A routed to IRQ 19
>>>>>>>>> Region 0: I/O ports at d000 [size=256]
>>>>>>>>> Region 2: Memory at f7b00000 (64-bit, non-prefetchable) [size=4K]
>>>>>>>>> Region 4: Memory at f2100000 (64-bit, prefetchable) [size=16K]
>>>>>>>>> Capabilities: [40] Power Management version 3
>>>>>>>>> Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
>>>>>>>>> PME(D0+,D1+,D2+,D3hot+,D3cold+)
>>>>>>>>> Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
>>>>>>>>> Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
>>>>>>>>> Address: 0000000000000000 Data: 0000
>>>>>>>>> Capabilities: [70] Express (v2) Endpoint, MSI 01
>>>>>>>>> DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s
>>>>>>>>> <512ns, L1 <64us
>>>>>>>>> ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
>>>>>>>>> SlotPowerLimit 10.000W
>>>>>>>>> DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
>>>>>>>>> RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
>>>>>>>>> MaxPayload 128 bytes, MaxReadReq 4096 bytes
>>>>>>>>> DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
>>>>>>>>> LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
>>>>>>>>> Latency L0s unlimited, L1 <64us
>>>>>>>>> ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
>>>>>>>>> LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
>>>>>>>>> ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
>>>>>>>>> LnkSta: Speed 2.5GT/s (ok), Width x1 (ok)
>>>>>>>>> TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
>>>>>>>>> DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+,
>>>>>>>>> OBFF Via message/WAKE#
>>>>>>>>> AtomicOpsCap: 32bit- 64bit- 128bitCAS-
>>>>>>>>> DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+,
>>>>>>>>> OBFF Disabled
>>>>>>>>> AtomicOpsCtl: ReqEn-
>>>>>>>>> LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
>>>>>>>>> Transmit Margin: Normal Operating Range,
>>>>>>>>> EnterModifiedCompliance- ComplianceSOS-
>>>>>>>>> Compliance De-emphasis: -6dB
>>>>>>>>> LnkSta2: Current De-emphasis Level: -6dB,
>>>>>>>>> EqualizationComplete-, EqualizationPhase1-
>>>>>>>>> EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
>>>>>>>>> Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
>>>>>>>>> Vector table: BAR=4 offset=00000000
>>>>>>>>> PBA: BAR=4 offset=00000800
>>>>>>>>> Capabilities: [d0] Vital Product Data
>>>>>>>>> pcilib: sysfs_read_vpd: read failed: Input/output error
>>>>>>>>> Not readable
>>>>>>>>> Capabilities: [100 v1] Advanced Error Reporting
>>>>>>>>> UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>>>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>>>>>>>> UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>>>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>>>>>>>> UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
>>>>>>>>> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>>>>>>>>> CESta: RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ AdvNonFatalErr-
>>>>>>>>> CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
>>>>>>>>> AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn-
>>>>>>>>> ECRCChkCap+ ECRCChkEn-
>>>>>>>>> MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
>>>>>>>>> HeaderLog: 00000000 00000000 00000000 00000000
>>>>>>>>> Capabilities: [140 v1] Virtual Channel
>>>>>>>>> Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
>>>>>>>>> Arb: Fixed- WRR32- WRR64- WRR128-
>>>>>>>>> Ctrl: ArbSelect=Fixed
>>>>>>>>> Status: InProgress-
>>>>>>>>> VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
>>>>>>>>> Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
>>>>>>>>> Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01
>>>>>>>>> Status: NegoPending- InProgress-
>>>>>>>>> Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
>>>>>>>>> Capabilities: [170 v1] Latency Tolerance Reporting
>>>>>>>>> Max snoop latency: 71680ns
>>>>>>>>> Max no snoop latency: 71680ns
>>>>>>>>> Kernel driver in use: r8169
>>>>>>>>> Kernel modules: r8169
>>>>>>>>>
>>>>>>>>> Please let me know if you have any other ideas in terms of testing.
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>>
>>>>>>>>> Peter.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, 29 Jan 2019 at 05:28, Heiner Kallweit <hkallweit1@...il.com> wrote:
>>>>>>>>>>
>>>>>>>>>> On 28.01.2019 12:13, Peter Ceiley wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I have been experiencing very poor network performance since Kernel
>>>>>>>>>>> 4.19 and I'm confident it's related to the r8169 driver.
>>>>>>>>>>>
>>>>>>>>>>> I have no issue with kernel versions 4.18 and prior. I am experiencing
>>>>>>>>>>> this issue in kernels 4.19 and 4.20 (currently running/testing with
>>>>>>>>>>> 4.20.4 & 4.19.18).
>>>>>>>>>>>
>>>>>>>>>>> If someone could guide me in the right direction, I'm happy to help
>>>>>>>>>>> troubleshoot this issue. Note that I have been keeping an eye on one
>>>>>>>>>>> issue related to loading of the PHY driver, however, my symptoms
>>>>>>>>>>> differ in that I still have a network connection. I have attempted to
>>>>>>>>>>> reload the driver on a running system, but this does not improve the
>>>>>>>>>>> situation.
>>>>>>>>>>>
>>>>>>>>>>> Using the proprietary r8168 driver returns my device to proper working order.
>>>>>>>>>>>
>>>>>>>>>>> lshw shows:
>>>>>>>>>>> description: Ethernet interface
>>>>>>>>>>> product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
>>>>>>>>>>> vendor: Realtek Semiconductor Co., Ltd.
>>>>>>>>>>> physical id: 0
>>>>>>>>>>> bus info: pci@...0:03:00.0
>>>>>>>>>>> logical name: enp3s0
>>>>>>>>>>> version: 0c
>>>>>>>>>>> serial:
>>>>>>>>>>> size: 1Gbit/s
>>>>>>>>>>> capacity: 1Gbit/s
>>>>>>>>>>> width: 64 bits
>>>>>>>>>>> clock: 33MHz
>>>>>>>>>>> capabilities: pm msi pciexpress msix vpd bus_master cap_list
>>>>>>>>>>> ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd
>>>>>>>>>>> 1000bt-fd autonegotiation
>>>>>>>>>>> configuration: autonegotiation=on broadcast=yes driver=r8169
>>>>>>>>>>> duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25
>>>>>>>>>>> latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
>>>>>>>>>>> resources: irq:19 ioport:d000(size=256)
>>>>>>>>>>> memory:f7b00000-f7b00fff memory:f2100000-f2103fff
>>>>>>>>>>>
>>>>>>>>>>> Kind Regards,
>>>>>>>>>>>
>>>>>>>>>>> Peter.
>>>>>>>>>>>
>>>>>>>>>> Hi Peter,
>>>>>>>>>>
>>>>>>>>>> the description "poor network performance" is quite vague, therefore:
>>>>>>>>>>
>>>>>>>>>> - Can you provide any measurements?
>>>>>>>>>> - iperf results before and after
>>>>>>>>>> - statistics about dropped packets (rx and/or tx)
>>>>>>>>>> - Do you use jumbo packets?
>>>>>>>>>>
>>>>>>>>>> Also help would be a "lspci -vv" output for the network card and
>>>>>>>>>> the dmesg output line with the chip XID.
>>>>>>>>>>
>>>>>>>>>> Heiner
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
Powered by blists - more mailing lists