lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMLO_R5m+tFa2yzeMbacROrFirwWN+zCUVAbDd864RHVMNe08Q@mail.gmail.com>
Date:   Thu, 31 Jan 2019 23:09:11 +1100
From:   Peter Ceiley <peter@...ley.net>
To:     David Chang <dchang@...e.com>
Cc:     Heiner Kallweit <hkallweit1@...il.com>,
        Realtek linux nic maintainers <nic_swsd@...ltek.com>,
        netdev@...r.kernel.org
Subject: Re: r8169 Driver - Poor Network Performance Since Kernel 4.19

Hi Heiner,

A quick update on my testing with different pcie_aspm settings:

pcie_aspm=off | no change
pcie_aspm.policy=default | no change
pcie_aspm.policy=performance | issue resolved
pcie_aspm.policy=powersave | issue resolved
pcie_aspm.policy=powersupersave | issue resolved

It seems the new driver does not play nicely with the default ASPM policy.

As requested, I've included an output of ethtool below when experiencing
the issue - note that no errors are recorded.

# ethtool -S enp3s0
NIC statistics:
     tx_packets: 2749
     rx_packets: 4089
     tx_errors: 0
     rx_errors: 0
     rx_missed: 0
     align_errors: 0
     tx_single_collisions: 0
     tx_multi_collisions: 0
     unicast: 4078
     broadcast: 9
     multicast: 2
     tx_aborted: 0
     tx_underrun: 0

David, I hope this helps for your user as well. I appreciate you sharing
the bug ticket - thanks.

Heiner, thanks very much for your help to date.

Regards,

Peter.

On Thu, 31 Jan 2019 at 18:23, David Chang <dchang@...e.com> wrote:
>
> Hi Heiner,
>
> On Jan 31, 2019 at 07:35:30 +0100, Heiner Kallweit wrote:
> > Hi David, two more things:
> >
> > 1. Could you please test a recent linux-next kernel?
> > 2. Please get a register dump (ethtool -d <if>) from 4.18 and 4.19
> >    and compare them.
>
> I'm sorry that I do not have the issue machine handy. I would ask
> our user to do the test. Thanks!
>
> Regards,
> David
>
> >
> > Heiner
> >
> >
> > On 31.01.2019 07:21, Heiner Kallweit wrote:
> > > David, thanks for the link to the bug ticket.
> > > I think only a proper bisect can help to find the offending commit.
> > >
> > > Heiner
> > >
> > >
> > > On 31.01.2019 03:32, David Chang wrote:
> > >> Hi,
> > >>
> > >> We had a similr case here.
> > >> - Realtek r8169 receive performance regression in kernel 4.19
> > >>   https://bugzilla.suse.com/show_bug.cgi?id=1119649
> > >>
> > >> kernel: r8169 0000:01:00.0 eth0: RTL8168h/8111h, XID 54100880
> > >> The major symptom is there are many rx_missed count.
> > >>
> > >>
> > >> On Jan 30, 2019 at 20:15:45 +0100, Heiner Kallweit wrote:
> > >>> Hi Peter,
> > >>>
> > >>> recently I had somebody where pcie_aspm=off for whatever reason didn't
> > >>> do the trick, can you also check with pcie_aspm.policy=performance.
> > >>
> > >> We will give it a try later.
> > >>
> > >>> And please check with "ethtool -S <if>" whether the chip statistics
> > >>> show a significant number of errors.
> > >>>
> > >>> If this doesn't help you may have to bisect to find the offending commit.
> > >>
> > >> We had tried fallback driver to a few previous commits as following,
> > >> but with no luck.
> > >>
> > >> 9675931e6b65 r8169: re-enable MSI-X on RTL8168g (v4.19)
> > >> 098b01ad9837 r8169: don't include asm headers directly (v4.19-rc1)
> > >> a2965f12fde6 r8169: remove rtl8169_set_speed_xmii (v4.19-rc1)
> > >> 6fcf9b1d4d6c r8169: fix runtime suspend (v4.19-rc1)
> > >> e397286b8e89 r8169: remove TBI 1000BaseX support (v4.19-rc1)
> > >>
> > >> Thanks,
> > >> David Chang
> > >>
> > >>>
> > >>> Heiner
> > >>>
> > >>>
> > >>> On 30.01.2019 10:59, Peter Ceiley wrote:
> > >>>> Hi Heiner,
> > >>>>
> > >>>> I tried disabling the ASPM using the pcie_aspm=off kernel parameter
> > >>>> and this made no difference.
> > >>>>
> > >>>> I tried compiling the 4.18.16 r8169.c with the 4.19.18 source and
> > >>>> subsequently loaded the module in the running 4.19.18 kernel. I can
> > >>>> confirm that this immediately resolved the issue and access to the NFS
> > >>>> shares operated as expected.
> > >>>>
> > >>>> I presume this means it is an issue with the r8169 driver included in
> > >>>> 4.19 onwards?
> > >>>>
> > >>>> To answer your last questions:
> > >>>>
> > >>>> Base Board Information
> > >>>>     Manufacturer: Alienware
> > >>>>     Product Name: 0PGRP5
> > >>>>     Version: A02
> > >>>>
> > >>>> ... and yes, the RTL8168 is the onboard network chip.
> > >>>>
> > >>>> Regards,
> > >>>>
> > >>>> Peter.
> > >>>>
> > >>>> On Tue, 29 Jan 2019 at 17:44, Heiner Kallweit <hkallweit1@...il.com> wrote:
> > >>>>>
> > >>>>> Hi Peter,
> > >>>>>
> > >>>>> I think the vendor driver doesn't enable ASPM per default.
> > >>>>> So it's worth a try to disable ASPM in the BIOS or via sysfs.
> > >>>>> Few older systems seem to have issues with ASPM, what kind of
> > >>>>> system / mainboard are you using? The RTL8168 is the onboard
> > >>>>> network chip?
> > >>>>>
> > >>>>> Rgds, Heiner
> > >>>>>
> > >>>>>
> > >>>>> On 29.01.2019 07:20, Peter Ceiley wrote:
> > >>>>>> Hi Heiner,
> > >>>>>>
> > >>>>>> Thanks, I'll do some more testing. It might not be the driver - I
> > >>>>>> assumed it was due to the fact that using the r8168 driver 'resolves'
> > >>>>>> the issue. I'll see if I can test the r8169.c on top of 4.19 - this is
> > >>>>>> a good idea.
> > >>>>>>
> > >>>>>> Cheers,
> > >>>>>>
> > >>>>>> Peter.
> > >>>>>>
> > >>>>>> On Tue, 29 Jan 2019 at 17:16, Heiner Kallweit <hkallweit1@...il.com> wrote:
> > >>>>>>>
> > >>>>>>> Hi Peter,
> > >>>>>>>
> > >>>>>>> at a first glance it doesn't look like a typical driver issue.
> > >>>>>>> What you could do:
> > >>>>>>>
> > >>>>>>> - Test the r8169.c from 4.18 on top of 4.19.
> > >>>>>>>
> > >>>>>>> - Check whether disabling ASPM (/sys/module/pcie_aspm) has an effect.
> > >>>>>>>
> > >>>>>>> - Bisect between 4.18 and 4.19 to find the offending commit.
> > >>>>>>>
> > >>>>>>> Any specific reason why you think root cause is in the driver and not
> > >>>>>>> elsewhere in the network subsystem?
> > >>>>>>>
> > >>>>>>> Heiner
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On 28.01.2019 23:10, Peter Ceiley wrote:
> > >>>>>>>> Hi Heiner,
> > >>>>>>>>
> > >>>>>>>> Thanks for getting back to me.
> > >>>>>>>>
> > >>>>>>>> No, I don't use jumbo packets.
> > >>>>>>>>
> > >>>>>>>> Bandwidth is *generally* good, and iperf results to my NAS provide
> > >>>>>>>> over 900 Mbits/s in both circumstances. The issue seems to appear when
> > >>>>>>>> establishing a connection and is most notable, for example, on my
> > >>>>>>>> mounted NFS shares where it takes seconds (up to 10's of seconds on
> > >>>>>>>> larger directories) to list the contents of each directory. Once a
> > >>>>>>>> transfer begins on a file, I appear to get good bandwidth.
> > >>>>>>>>
> > >>>>>>>> I'm unsure of the best scientific data to provide you in order to
> > >>>>>>>> troubleshoot this issue. Running the following
> > >>>>>>>>
> > >>>>>>>>     netstat -s |grep retransmitted
> > >>>>>>>>
> > >>>>>>>> shows a steady increase in retransmitted segments each time I list the
> > >>>>>>>> contents of a remote directory, for example, running 'ls' on a
> > >>>>>>>> directory containing 345 media files did the following using kernel
> > >>>>>>>> 4.19.18:
> > >>>>>>>>
> > >>>>>>>> increased retransmitted segments by 21 and the 'time' command showed
> > >>>>>>>> the following:
> > >>>>>>>>     real    0m19.867s
> > >>>>>>>>     user    0m0.012s
> > >>>>>>>>     sys    0m0.036s
> > >>>>>>>>
> > >>>>>>>> The same command shows no retransmitted segments running kernel
> > >>>>>>>> 4.18.16 and 'time' showed:
> > >>>>>>>>     real    0m0.300s
> > >>>>>>>>     user    0m0.004s
> > >>>>>>>>     sys    0m0.007s
> > >>>>>>>>
> > >>>>>>>> ifconfig does not show any RX/TX errors nor dropped packets in either case.
> > >>>>>>>>
> > >>>>>>>> dmesg XID:
> > >>>>>>>> [    2.979984] r8169 0000:03:00.0 eth0: RTL8168g/8111g,
> > >>>>>>>> f8:b1:56:fe:67:e0, XID 4c000800, IRQ 32
> > >>>>>>>>
> > >>>>>>>> # lspci -vv
> > >>>>>>>> 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
> > >>>>>>>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
> > >>>>>>>>     Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
> > >>>>>>>>     Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> > >>>>>>>> ParErr- Stepping- SERR- FastB2B- DisINTx+
> > >>>>>>>>     Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> > >>>>>>>> <TAbort- <MAbort- >SERR- <PERR- INTx-
> > >>>>>>>>     Latency: 0, Cache Line Size: 64 bytes
> > >>>>>>>>     Interrupt: pin A routed to IRQ 19
> > >>>>>>>>     Region 0: I/O ports at d000 [size=256]
> > >>>>>>>>     Region 2: Memory at f7b00000 (64-bit, non-prefetchable) [size=4K]
> > >>>>>>>>     Region 4: Memory at f2100000 (64-bit, prefetchable) [size=16K]
> > >>>>>>>>     Capabilities: [40] Power Management version 3
> > >>>>>>>>         Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
> > >>>>>>>> PME(D0+,D1+,D2+,D3hot+,D3cold+)
> > >>>>>>>>         Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> > >>>>>>>>     Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
> > >>>>>>>>         Address: 0000000000000000  Data: 0000
> > >>>>>>>>     Capabilities: [70] Express (v2) Endpoint, MSI 01
> > >>>>>>>>         DevCap:    MaxPayload 128 bytes, PhantFunc 0, Latency L0s
> > >>>>>>>> <512ns, L1 <64us
> > >>>>>>>>             ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
> > >>>>>>>> SlotPowerLimit 10.000W
> > >>>>>>>>         DevCtl:    CorrErr- NonFatalErr- FatalErr- UnsupReq-
> > >>>>>>>>             RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> > >>>>>>>>             MaxPayload 128 bytes, MaxReadReq 4096 bytes
> > >>>>>>>>         DevSta:    CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
> > >>>>>>>>         LnkCap:    Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
> > >>>>>>>> Latency L0s unlimited, L1 <64us
> > >>>>>>>>             ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
> > >>>>>>>>         LnkCtl:    ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
> > >>>>>>>>             ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
> > >>>>>>>>         LnkSta:    Speed 2.5GT/s (ok), Width x1 (ok)
> > >>>>>>>>             TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> > >>>>>>>>         DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+,
> > >>>>>>>> OBFF Via message/WAKE#
> > >>>>>>>>              AtomicOpsCap: 32bit- 64bit- 128bitCAS-
> > >>>>>>>>         DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+,
> > >>>>>>>> OBFF Disabled
> > >>>>>>>>              AtomicOpsCtl: ReqEn-
> > >>>>>>>>         LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
> > >>>>>>>>              Transmit Margin: Normal Operating Range,
> > >>>>>>>> EnterModifiedCompliance- ComplianceSOS-
> > >>>>>>>>              Compliance De-emphasis: -6dB
> > >>>>>>>>         LnkSta2: Current De-emphasis Level: -6dB,
> > >>>>>>>> EqualizationComplete-, EqualizationPhase1-
> > >>>>>>>>              EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
> > >>>>>>>>     Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
> > >>>>>>>>         Vector table: BAR=4 offset=00000000
> > >>>>>>>>         PBA: BAR=4 offset=00000800
> > >>>>>>>>     Capabilities: [d0] Vital Product Data
> > >>>>>>>> pcilib: sysfs_read_vpd: read failed: Input/output error
> > >>>>>>>>         Not readable
> > >>>>>>>>     Capabilities: [100 v1] Advanced Error Reporting
> > >>>>>>>>         UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> > >>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> > >>>>>>>>         UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> > >>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> > >>>>>>>>         UESvrt:    DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
> > >>>>>>>> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> > >>>>>>>>         CESta:    RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ AdvNonFatalErr-
> > >>>>>>>>         CEMsk:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
> > >>>>>>>>         AERCap:    First Error Pointer: 00, ECRCGenCap+ ECRCGenEn-
> > >>>>>>>> ECRCChkCap+ ECRCChkEn-
> > >>>>>>>>             MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
> > >>>>>>>>         HeaderLog: 00000000 00000000 00000000 00000000
> > >>>>>>>>     Capabilities: [140 v1] Virtual Channel
> > >>>>>>>>         Caps:    LPEVC=0 RefClk=100ns PATEntryBits=1
> > >>>>>>>>         Arb:    Fixed- WRR32- WRR64- WRR128-
> > >>>>>>>>         Ctrl:    ArbSelect=Fixed
> > >>>>>>>>         Status:    InProgress-
> > >>>>>>>>         VC0:    Caps:    PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
> > >>>>>>>>             Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
> > >>>>>>>>             Ctrl:    Enable+ ID=0 ArbSelect=Fixed TC/VC=01
> > >>>>>>>>             Status:    NegoPending- InProgress-
> > >>>>>>>>     Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
> > >>>>>>>>     Capabilities: [170 v1] Latency Tolerance Reporting
> > >>>>>>>>         Max snoop latency: 71680ns
> > >>>>>>>>         Max no snoop latency: 71680ns
> > >>>>>>>>     Kernel driver in use: r8169
> > >>>>>>>>     Kernel modules: r8169
> > >>>>>>>>
> > >>>>>>>> Please let me know if you have any other ideas in terms of testing.
> > >>>>>>>>
> > >>>>>>>> Thanks!
> > >>>>>>>>
> > >>>>>>>> Peter.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> On Tue, 29 Jan 2019 at 05:28, Heiner Kallweit <hkallweit1@...il.com> wrote:
> > >>>>>>>>>
> > >>>>>>>>> On 28.01.2019 12:13, Peter Ceiley wrote:
> > >>>>>>>>>> Hi,
> > >>>>>>>>>>
> > >>>>>>>>>> I have been experiencing very poor network performance since Kernel
> > >>>>>>>>>> 4.19 and I'm confident it's related to the r8169 driver.
> > >>>>>>>>>>
> > >>>>>>>>>> I have no issue with kernel versions 4.18 and prior. I am experiencing
> > >>>>>>>>>> this issue in kernels 4.19 and 4.20 (currently running/testing with
> > >>>>>>>>>> 4.20.4 & 4.19.18).
> > >>>>>>>>>>
> > >>>>>>>>>> If someone could guide me in the right direction, I'm happy to help
> > >>>>>>>>>> troubleshoot this issue. Note that I have been keeping an eye on one
> > >>>>>>>>>> issue related to loading of the PHY driver, however, my symptoms
> > >>>>>>>>>> differ in that I still have a network connection. I have attempted to
> > >>>>>>>>>> reload the driver on a running system, but this does not improve the
> > >>>>>>>>>> situation.
> > >>>>>>>>>>
> > >>>>>>>>>> Using the proprietary r8168 driver returns my device to proper working order.
> > >>>>>>>>>>
> > >>>>>>>>>> lshw shows:
> > >>>>>>>>>>        description: Ethernet interface
> > >>>>>>>>>>        product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
> > >>>>>>>>>>        vendor: Realtek Semiconductor Co., Ltd.
> > >>>>>>>>>>        physical id: 0
> > >>>>>>>>>>        bus info: pci@...0:03:00.0
> > >>>>>>>>>>        logical name: enp3s0
> > >>>>>>>>>>        version: 0c
> > >>>>>>>>>>        serial:
> > >>>>>>>>>>        size: 1Gbit/s
> > >>>>>>>>>>        capacity: 1Gbit/s
> > >>>>>>>>>>        width: 64 bits
> > >>>>>>>>>>        clock: 33MHz
> > >>>>>>>>>>        capabilities: pm msi pciexpress msix vpd bus_master cap_list
> > >>>>>>>>>> ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd
> > >>>>>>>>>> 1000bt-fd autonegotiation
> > >>>>>>>>>>        configuration: autonegotiation=on broadcast=yes driver=r8169
> > >>>>>>>>>> duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25
> > >>>>>>>>>> latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
> > >>>>>>>>>>        resources: irq:19 ioport:d000(size=256)
> > >>>>>>>>>> memory:f7b00000-f7b00fff memory:f2100000-f2103fff
> > >>>>>>>>>>
> > >>>>>>>>>> Kind Regards,
> > >>>>>>>>>>
> > >>>>>>>>>> Peter.
> > >>>>>>>>>>
> > >>>>>>>>> Hi Peter,
> > >>>>>>>>>
> > >>>>>>>>> the description "poor network performance" is quite vague, therefore:
> > >>>>>>>>>
> > >>>>>>>>> - Can you provide any measurements?
> > >>>>>>>>> - iperf results before and after
> > >>>>>>>>> - statistics about dropped packets (rx and/or tx)
> > >>>>>>>>> - Do you use jumbo packets?
> > >>>>>>>>>
> > >>>>>>>>> Also help would be a "lspci -vv" output for the network card and
> > >>>>>>>>> the dmesg output line with the chip XID.
> > >>>>>>>>>
> > >>>>>>>>> Heiner
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> > >
> >
> >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ