[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190214024527.GG7193@linux-kyyb.suse>
Date: Thu, 14 Feb 2019 10:45:27 +0800
From: David Chang <dchang@...e.com>
To: Heiner Kallweit <hkallweit1@...il.com>
Cc: Realtek linux nic maintainers <nic_swsd@...ltek.com>,
netdev@...r.kernel.org, Martti Laaksonen <martti.laaksonen@....fi>
Subject: Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
Hi Heiner,
On Feb 05, 2019 at 19:50:30 +0100, Heiner Kallweit wrote:
> Hi David,
>
> meanwhile there's the following bug report matching what reported.
> It's even the same chip version (RTL8168h).
> https://bugzilla.redhat.com/show_bug.cgi?id=1671958
>
> Symptom there is also a significant number of rx_missed packets.
> Could you try what I mentioned there last:
> Try building a kernel with the call to rtl_hw_aspm_clkreq_enable(tp, true) at the
> end of rtl_hw_start_8168h_1() being disabled.
After disabled the aspm function that you mentioned, we finally got the
positive testing result. And the rx_missed error was gone. If without
the patch, the receive side get back to bad performance.
kernel: r8169: loading out-of-tree module taints kernel.
kernel: r8169: module verification failed: signature and/or required key missing - tainting kernel
kernel: libphy: r8169: probed
kernel: r8169 0000:01:00.0 eth0: RTL8168h/8111h, ec:8e:b5:5a:2c:f5, XID 54100880, IRQ 128
kernel: r8169 0000:01:00.0 eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko]
kernel: r8169 0000:01:00.0 enp1s0: renamed from eth0
kernel: Generic PHY r8169-100:00: attached PHY driver [Generic PHY] (mii_bus:phy_addr=r8169-100:00, irq=IGNORE)
kernel: r8169 0000:01:00.0 enp1s0: Link is Up - 1Gbps/Full - flow control off
NIC statistics:
tx_packets: 1653804
rx_packets: 1555966
tx_errors: 0
rx_errors: 0
rx_missed: 0
align_errors: 0
tx_single_collisions: 0
tx_multi_collisions: 0
unicast: 1555884
broadcast: 78
multicast: 4
tx_aborted: 0
tx_underrun: 0
iperf receive:
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
Accepted connection from 10.x.x.x, port 55516
[ 5] local 10.x.x.x port 5201 connected to 10.x.x.x port 58172
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 108 MBytes 906 Mbits/sec
[ 5] 1.00-2.00 sec 112 MBytes 941 Mbits/sec
[ 5] 2.00-3.00 sec 112 MBytes 940 Mbits/sec
[ 5] 3.00-4.00 sec 112 MBytes 941 Mbits/sec
[ 5] 4.00-5.00 sec 112 MBytes 941 Mbits/sec
[ 5] 5.00-6.00 sec 112 MBytes 942 Mbits/sec
[ 5] 6.00-7.00 sec 112 MBytes 939 Mbits/sec
[ 5] 7.00-8.00 sec 112 MBytes 941 Mbits/sec
[ 5] 8.00-9.00 sec 112 MBytes 938 Mbits/sec
[ 5] 9.00-10.00 sec 112 MBytes 941 Mbits/sec
[ 5] 10.00-11.00 sec 112 MBytes 941 Mbits/sec
[...]
[ 5] 50.00-51.00 sec 112 MBytes 941 Mbits/sec
[ 5] 51.00-52.00 sec 112 MBytes 941 Mbits/sec
[ 5] 52.00-53.00 sec 112 MBytes 942 Mbits/sec
[ 5] 53.00-54.00 sec 112 MBytes 941 Mbits/sec
[ 5] 54.00-55.00 sec 111 MBytes 934 Mbits/sec
[ 5] 55.00-56.00 sec 112 MBytes 942 Mbits/sec
[ 5] 56.00-57.00 sec 112 MBytes 937 Mbits/sec
[ 5] 57.00-58.00 sec 112 MBytes 941 Mbits/sec
[ 5] 58.00-59.00 sec 111 MBytes 932 Mbits/sec
[ 5] 59.00-60.00 sec 112 MBytes 942 Mbits/sec
[ 5] 60.00-60.04 sec 4.06 MBytes 939 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate
[ 5] 0.00-60.04 sec 6.57 GBytes 940 Mbits/sec receiver
regards,
David
>
> Heiner
>
>
> On 31.01.2019 03:32, David Chang wrote:
> > Hi,
> >
> > We had a similr case here.
> > - Realtek r8169 receive performance regression in kernel 4.19
> > https://bugzilla.suse.com/show_bug.cgi?id=1119649
> >
> > kernel: r8169 0000:01:00.0 eth0: RTL8168h/8111h, XID 54100880
> > The major symptom is there are many rx_missed count.
> >
> >
> > On Jan 30, 2019 at 20:15:45 +0100, Heiner Kallweit wrote:
> >> Hi Peter,
> >>
> >> recently I had somebody where pcie_aspm=off for whatever reason didn't
> >> do the trick, can you also check with pcie_aspm.policy=performance.
> >
> > We will give it a try later.
> >
> >> And please check with "ethtool -S <if>" whether the chip statistics
> >> show a significant number of errors.
> >>
> >> If this doesn't help you may have to bisect to find the offending commit.
> >
> > We had tried fallback driver to a few previous commits as following,
> > but with no luck.
> >
> > 9675931e6b65 r8169: re-enable MSI-X on RTL8168g (v4.19)
> > 098b01ad9837 r8169: don't include asm headers directly (v4.19-rc1)
> > a2965f12fde6 r8169: remove rtl8169_set_speed_xmii (v4.19-rc1)
> > 6fcf9b1d4d6c r8169: fix runtime suspend (v4.19-rc1)
> > e397286b8e89 r8169: remove TBI 1000BaseX support (v4.19-rc1)
> >
> > Thanks,
> > David Chang
> >
> >>
> >> Heiner
> >>
> >>
> >> On 30.01.2019 10:59, Peter Ceiley wrote:
> >>> Hi Heiner,
> >>>
> >>> I tried disabling the ASPM using the pcie_aspm=off kernel parameter
> >>> and this made no difference.
> >>>
> >>> I tried compiling the 4.18.16 r8169.c with the 4.19.18 source and
> >>> subsequently loaded the module in the running 4.19.18 kernel. I can
> >>> confirm that this immediately resolved the issue and access to the NFS
> >>> shares operated as expected.
> >>>
> >>> I presume this means it is an issue with the r8169 driver included in
> >>> 4.19 onwards?
> >>>
> >>> To answer your last questions:
> >>>
> >>> Base Board Information
> >>> Manufacturer: Alienware
> >>> Product Name: 0PGRP5
> >>> Version: A02
> >>>
> >>> ... and yes, the RTL8168 is the onboard network chip.
> >>>
> >>> Regards,
> >>>
> >>> Peter.
> >>>
> >>> On Tue, 29 Jan 2019 at 17:44, Heiner Kallweit <hkallweit1@...il.com> wrote:
> >>>>
> >>>> Hi Peter,
> >>>>
> >>>> I think the vendor driver doesn't enable ASPM per default.
> >>>> So it's worth a try to disable ASPM in the BIOS or via sysfs.
> >>>> Few older systems seem to have issues with ASPM, what kind of
> >>>> system / mainboard are you using? The RTL8168 is the onboard
> >>>> network chip?
> >>>>
> >>>> Rgds, Heiner
> >>>>
> >>>>
> >>>> On 29.01.2019 07:20, Peter Ceiley wrote:
> >>>>> Hi Heiner,
> >>>>>
> >>>>> Thanks, I'll do some more testing. It might not be the driver - I
> >>>>> assumed it was due to the fact that using the r8168 driver 'resolves'
> >>>>> the issue. I'll see if I can test the r8169.c on top of 4.19 - this is
> >>>>> a good idea.
> >>>>>
> >>>>> Cheers,
> >>>>>
> >>>>> Peter.
> >>>>>
> >>>>> On Tue, 29 Jan 2019 at 17:16, Heiner Kallweit <hkallweit1@...il.com> wrote:
> >>>>>>
> >>>>>> Hi Peter,
> >>>>>>
> >>>>>> at a first glance it doesn't look like a typical driver issue.
> >>>>>> What you could do:
> >>>>>>
> >>>>>> - Test the r8169.c from 4.18 on top of 4.19.
> >>>>>>
> >>>>>> - Check whether disabling ASPM (/sys/module/pcie_aspm) has an effect.
> >>>>>>
> >>>>>> - Bisect between 4.18 and 4.19 to find the offending commit.
> >>>>>>
> >>>>>> Any specific reason why you think root cause is in the driver and not
> >>>>>> elsewhere in the network subsystem?
> >>>>>>
> >>>>>> Heiner
> >>>>>>
> >>>>>>
> >>>>>> On 28.01.2019 23:10, Peter Ceiley wrote:
> >>>>>>> Hi Heiner,
> >>>>>>>
> >>>>>>> Thanks for getting back to me.
> >>>>>>>
> >>>>>>> No, I don't use jumbo packets.
> >>>>>>>
> >>>>>>> Bandwidth is *generally* good, and iperf results to my NAS provide
> >>>>>>> over 900 Mbits/s in both circumstances. The issue seems to appear when
> >>>>>>> establishing a connection and is most notable, for example, on my
> >>>>>>> mounted NFS shares where it takes seconds (up to 10's of seconds on
> >>>>>>> larger directories) to list the contents of each directory. Once a
> >>>>>>> transfer begins on a file, I appear to get good bandwidth.
> >>>>>>>
> >>>>>>> I'm unsure of the best scientific data to provide you in order to
> >>>>>>> troubleshoot this issue. Running the following
> >>>>>>>
> >>>>>>> netstat -s |grep retransmitted
> >>>>>>>
> >>>>>>> shows a steady increase in retransmitted segments each time I list the
> >>>>>>> contents of a remote directory, for example, running 'ls' on a
> >>>>>>> directory containing 345 media files did the following using kernel
> >>>>>>> 4.19.18:
> >>>>>>>
> >>>>>>> increased retransmitted segments by 21 and the 'time' command showed
> >>>>>>> the following:
> >>>>>>> real 0m19.867s
> >>>>>>> user 0m0.012s
> >>>>>>> sys 0m0.036s
> >>>>>>>
> >>>>>>> The same command shows no retransmitted segments running kernel
> >>>>>>> 4.18.16 and 'time' showed:
> >>>>>>> real 0m0.300s
> >>>>>>> user 0m0.004s
> >>>>>>> sys 0m0.007s
> >>>>>>>
> >>>>>>> ifconfig does not show any RX/TX errors nor dropped packets in either case.
> >>>>>>>
> >>>>>>> dmesg XID:
> >>>>>>> [ 2.979984] r8169 0000:03:00.0 eth0: RTL8168g/8111g,
> >>>>>>> f8:b1:56:fe:67:e0, XID 4c000800, IRQ 32
> >>>>>>>
> >>>>>>> # lspci -vv
> >>>>>>> 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
> >>>>>>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
> >>>>>>> Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
> >>>>>>> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> >>>>>>> ParErr- Stepping- SERR- FastB2B- DisINTx+
> >>>>>>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> >>>>>>> <TAbort- <MAbort- >SERR- <PERR- INTx-
> >>>>>>> Latency: 0, Cache Line Size: 64 bytes
> >>>>>>> Interrupt: pin A routed to IRQ 19
> >>>>>>> Region 0: I/O ports at d000 [size=256]
> >>>>>>> Region 2: Memory at f7b00000 (64-bit, non-prefetchable) [size=4K]
> >>>>>>> Region 4: Memory at f2100000 (64-bit, prefetchable) [size=16K]
> >>>>>>> Capabilities: [40] Power Management version 3
> >>>>>>> Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
> >>>>>>> PME(D0+,D1+,D2+,D3hot+,D3cold+)
> >>>>>>> Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> >>>>>>> Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
> >>>>>>> Address: 0000000000000000 Data: 0000
> >>>>>>> Capabilities: [70] Express (v2) Endpoint, MSI 01
> >>>>>>> DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s
> >>>>>>> <512ns, L1 <64us
> >>>>>>> ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
> >>>>>>> SlotPowerLimit 10.000W
> >>>>>>> DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
> >>>>>>> RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> >>>>>>> MaxPayload 128 bytes, MaxReadReq 4096 bytes
> >>>>>>> DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
> >>>>>>> LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
> >>>>>>> Latency L0s unlimited, L1 <64us
> >>>>>>> ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
> >>>>>>> LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
> >>>>>>> ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
> >>>>>>> LnkSta: Speed 2.5GT/s (ok), Width x1 (ok)
> >>>>>>> TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> >>>>>>> DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+,
> >>>>>>> OBFF Via message/WAKE#
> >>>>>>> AtomicOpsCap: 32bit- 64bit- 128bitCAS-
> >>>>>>> DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+,
> >>>>>>> OBFF Disabled
> >>>>>>> AtomicOpsCtl: ReqEn-
> >>>>>>> LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
> >>>>>>> Transmit Margin: Normal Operating Range,
> >>>>>>> EnterModifiedCompliance- ComplianceSOS-
> >>>>>>> Compliance De-emphasis: -6dB
> >>>>>>> LnkSta2: Current De-emphasis Level: -6dB,
> >>>>>>> EqualizationComplete-, EqualizationPhase1-
> >>>>>>> EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
> >>>>>>> Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
> >>>>>>> Vector table: BAR=4 offset=00000000
> >>>>>>> PBA: BAR=4 offset=00000800
> >>>>>>> Capabilities: [d0] Vital Product Data
> >>>>>>> pcilib: sysfs_read_vpd: read failed: Input/output error
> >>>>>>> Not readable
> >>>>>>> Capabilities: [100 v1] Advanced Error Reporting
> >>>>>>> UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> >>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> >>>>>>> UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> >>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> >>>>>>> UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
> >>>>>>> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> >>>>>>> CESta: RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ AdvNonFatalErr-
> >>>>>>> CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
> >>>>>>> AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn-
> >>>>>>> ECRCChkCap+ ECRCChkEn-
> >>>>>>> MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
> >>>>>>> HeaderLog: 00000000 00000000 00000000 00000000
> >>>>>>> Capabilities: [140 v1] Virtual Channel
> >>>>>>> Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
> >>>>>>> Arb: Fixed- WRR32- WRR64- WRR128-
> >>>>>>> Ctrl: ArbSelect=Fixed
> >>>>>>> Status: InProgress-
> >>>>>>> VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
> >>>>>>> Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
> >>>>>>> Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01
> >>>>>>> Status: NegoPending- InProgress-
> >>>>>>> Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
> >>>>>>> Capabilities: [170 v1] Latency Tolerance Reporting
> >>>>>>> Max snoop latency: 71680ns
> >>>>>>> Max no snoop latency: 71680ns
> >>>>>>> Kernel driver in use: r8169
> >>>>>>> Kernel modules: r8169
> >>>>>>>
> >>>>>>> Please let me know if you have any other ideas in terms of testing.
> >>>>>>>
> >>>>>>> Thanks!
> >>>>>>>
> >>>>>>> Peter.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Tue, 29 Jan 2019 at 05:28, Heiner Kallweit <hkallweit1@...il.com> wrote:
> >>>>>>>>
> >>>>>>>> On 28.01.2019 12:13, Peter Ceiley wrote:
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> I have been experiencing very poor network performance since Kernel
> >>>>>>>>> 4.19 and I'm confident it's related to the r8169 driver.
> >>>>>>>>>
> >>>>>>>>> I have no issue with kernel versions 4.18 and prior. I am experiencing
> >>>>>>>>> this issue in kernels 4.19 and 4.20 (currently running/testing with
> >>>>>>>>> 4.20.4 & 4.19.18).
> >>>>>>>>>
> >>>>>>>>> If someone could guide me in the right direction, I'm happy to help
> >>>>>>>>> troubleshoot this issue. Note that I have been keeping an eye on one
> >>>>>>>>> issue related to loading of the PHY driver, however, my symptoms
> >>>>>>>>> differ in that I still have a network connection. I have attempted to
> >>>>>>>>> reload the driver on a running system, but this does not improve the
> >>>>>>>>> situation.
> >>>>>>>>>
> >>>>>>>>> Using the proprietary r8168 driver returns my device to proper working order.
> >>>>>>>>>
> >>>>>>>>> lshw shows:
> >>>>>>>>> description: Ethernet interface
> >>>>>>>>> product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
> >>>>>>>>> vendor: Realtek Semiconductor Co., Ltd.
> >>>>>>>>> physical id: 0
> >>>>>>>>> bus info: pci@...0:03:00.0
> >>>>>>>>> logical name: enp3s0
> >>>>>>>>> version: 0c
> >>>>>>>>> serial:
> >>>>>>>>> size: 1Gbit/s
> >>>>>>>>> capacity: 1Gbit/s
> >>>>>>>>> width: 64 bits
> >>>>>>>>> clock: 33MHz
> >>>>>>>>> capabilities: pm msi pciexpress msix vpd bus_master cap_list
> >>>>>>>>> ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd
> >>>>>>>>> 1000bt-fd autonegotiation
> >>>>>>>>> configuration: autonegotiation=on broadcast=yes driver=r8169
> >>>>>>>>> duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25
> >>>>>>>>> latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
> >>>>>>>>> resources: irq:19 ioport:d000(size=256)
> >>>>>>>>> memory:f7b00000-f7b00fff memory:f2100000-f2103fff
> >>>>>>>>>
> >>>>>>>>> Kind Regards,
> >>>>>>>>>
> >>>>>>>>> Peter.
> >>>>>>>>>
> >>>>>>>> Hi Peter,
> >>>>>>>>
> >>>>>>>> the description "poor network performance" is quite vague, therefore:
> >>>>>>>>
> >>>>>>>> - Can you provide any measurements?
> >>>>>>>> - iperf results before and after
> >>>>>>>> - statistics about dropped packets (rx and/or tx)
> >>>>>>>> - Do you use jumbo packets?
> >>>>>>>>
> >>>>>>>> Also help would be a "lspci -vv" output for the network card and
> >>>>>>>> the dmesg output line with the chip XID.
> >>>>>>>>
> >>>>>>>> Heiner
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
>
>
Powered by blists - more mailing lists