lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAA7C2qit32YSTycxXsZhj97kcxZAKkhj9ciU+CO=zbknSnRNzg@mail.gmail.com>
Date:   Wed, 13 Mar 2019 20:04:29 -0700
From:   VDR User <user.vdr@...il.com>
To:     Heiner Kallweit <hkallweit1@...il.com>
Cc:     netdev@...r.kernel.org
Subject: Re: r8169 driver from kernel 5.0 crashing

Hi Heiner,

Thanks for your response. Request info follows..

> > Hi, after updating to kernel 5.0, the nic driver (r8169) has been
> > crashing whenever I start using heavy traffic on it (for example,
> > xferring large files to the box across my lan). The destination
> > harddrive may be sleeping and need to spin-up, or not, but the box
> > itself does not suspend/hibernate. The nic becomes completely
> > unresponsive and all connections to the box drop. After what I think
> > is several minutes, the connection comes back to life. The problem
> > happens consistently but seemingly not consistently at the same point.
> > For example, I can xfer a few 4gb files and it will crash at around
> > 2-3gb on the first file. The next time it might not crash until 2-3gb
> > on the second file.Prior to kernel 5.0 I was using 4.19.12 and this
> > problem didn't occur. I have since downgraded back to 4.19.12 pending
> > what response this post gets.
> >
> Thanks for the report. Helpful would be:
> - full dmesg output

Added as attachment.

> - "lspci -vv" output (as root) for the network card

04:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06)
        Subsystem: Elitegroup Computer Systems RTL8111/8168/8411 PCI
Express Gigabit Ethernet Controller
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 17
        Region 0: I/O ports at c000 [size=256]
        Region 2: Memory at d0004000 (64-bit, prefetchable) [size=4K]
        Region 4: Memory at d0000000 (64-bit, prefetchable) [size=16K]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
PME(D0+,D1+,D2+,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [70] Express (v2) Endpoint, MSI 01
                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s
<512ns, L1 <64us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+
FLReset- SlotPowerLimit 0.000W
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal-
Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 128 bytes, MaxReadReq 4096 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq-
AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1,
Exit Latency L0s unlimited, L1 <64us
                        ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train-
SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+,
LTR-, OBFF Not Supported
                DevCtl2: Completion Timeout: 50us to 50ms,
TimeoutDis-, LTR-, OBFF Disabled
                LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range,
EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB,
EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-,
LinkEqualizationRequest-
        Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
                Vector table: BAR=4 offset=00000000
                PBA: BAR=4 offset=00000800
        Capabilities: [d0] Vital Product Data
                Not readable
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt-
UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt-
UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt-
UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
        Capabilities: [140 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=01
                        Status: NegoPending- InProgress-
        Capabilities: [160 v1] Device Serial Number 40-01-00-00-68-4c-e0-00
        Kernel driver in use: r8169

> - ethtool -S <if> output

Unfortunately I just realized I did this _after_ ifdown/ifup'ing the
nic to get it back online so this is probably useless but I'll include
it anyway. If I get it to crash again I'll try to remember to get this
before restarting the nic:

NIC statistics:
     tx_packets: 8844844
     rx_packets: 23550316
     tx_errors: 0
     rx_errors: 0
     rx_missed: 13
     align_errors: 0
     tx_single_collisions: 0
     tx_multi_collisions: 0
     unicast: 23544796
     broadcast: 4420
     multicast: 1100
     tx_aborted: 0
     tx_underrun: 0

> Can you test a recent 4.20 kernel? This would narrow down the number
> of potentially problematic patches.

I compiled and test 4.20.15 and didn't experience any crashing. I then
switched back to 5.0.0 and this time I had to transfer significantly
more until the crash occured. I'm not sure but it seems like the
crashes happen when there's both outgoing & incoming traffic
simultaneously. Is the dmesg crash info helpful at all?

Thanks,
Derek

Download attachment "crash.dmesg.log" of type "application/octet-stream" (57368 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ