lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180821165858.GA1507@maindev>
Date:   Tue, 21 Aug 2018 09:58:58 -0700
From:   "Nikita V. Shirokov" <tehnerd@...nerd.com>
To:     Alexander Duyck <alexander.duyck@...il.com>
Cc:     netdev@...r.kernel.org, jeffrey.t.kirsher@...el.com
Subject: Re: ixgbe hangs when XDP_TX is enabled

On Tue, Aug 21, 2018 at 08:58:15AM -0700, Alexander Duyck wrote:
> On Mon, Aug 20, 2018 at 12:32 PM Nikita V. Shirokov <tehnerd@...nerd.com> wrote:
> >
> > we are getting such errors:
> >
> > [  408.737313] ixgbe 0000:03:00.0 eth0: Detected Tx Unit Hang (XDP)
> >                  Tx Queue             <46>
> >                  TDH, TDT             <0>, <2>
> >                  next_to_use          <2>
> >                  next_to_clean        <0>
> >                tx_buffer_info[next_to_clean]
> >                  time_stamp           <0>
> >                  jiffies              <1000197c0>
> > [  408.804438] ixgbe 0000:03:00.0 eth0: tx hang 1 detected on queue 46, resetting adapter
> > [  408.804440] ixgbe 0000:03:00.0 eth0: initiating reset due to tx timeout
> > [  408.817679] ixgbe 0000:03:00.0 eth0: Reset adapter
> > [  408.866091] ixgbe 0000:03:00.0 eth0: TXDCTL.ENABLE for one or more queues not cleared within the polling period
> > [  409.345289] ixgbe 0000:03:00.0 eth0: detected SFP+: 3
> > [  409.497232] ixgbe 0000:03:00.0 eth0: NIC Link is Up 10 Gbps, Flow Control: RX/TX
> >
> > while running XDP prog on ixgbe nic.
> > right now i'm seing this on bpfnext kernel
> > (latest commit from Wed Aug 15 15:04:25 2018 -0700 ;
> > 9a76aba02a37718242d7cdc294f0a3901928aa57)
> >
> > looks like this is the same issue as reported by Brenden in
> > https://www.spinics.net/lists/netdev/msg439438.html
> >
> > --
> > Nikita V. Shirokov
> 
> Could you provide some additional information about your setup.
> Specifically useful would be "ethtool -i", "ethtool -l", and lspci
> -vvv info for your device. The total number of CPUs on the system
> would be useful to know as well. In addition could you try
> reproducing
sure:

ethtool -l eth0
Channel parameters for eth0:
Pre-set maximums:
RX:             0
TX:             0
Other:          1
Combined:       63
Current hardware settings:
RX:             0
TX:             0
Other:          1
Combined:       48

# ethtool -i eth0
driver: ixgbe
version: 5.1.0-k
firmware-version: 0x800006f1
expansion-rom-version:
bus-info: 0000:03:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes


# nproc
48

lspci:

03:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
        Subsystem: Intel Corporation Device 000d
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 32 bytes
        Interrupt: pin A routed to IRQ 30
        NUMA node: 0
        Region 0: Memory at c7d00000 (64-bit, non-prefetchable) [size=1M]
        Region 2: I/O ports at 6000 [size=32]
        Region 4: Memory at c7e80000 (64-bit, non-prefetchable) [size=16K]
        Expansion ROM at c7e00000 [disabled] [size=512K]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
                Address: 0000000000000000  Data: 0000
                Masking: 00000000  Pending: 00000000
        Capabilities: [70] MSI-X: Enable+ Count=64 Masked-
                Vector table: BAR=4 offset=00000000
                PBA: BAR=4 offset=00002000
        Capabilities: [a0] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
                DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend+
                LnkCap: Port #2, Speed 5GT/s, Width x8, ASPM L0s, Exit Latency L0s unlimited, L1 <8us
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
        Capabilities: [140 v1] Device Serial Number 90-e2-ba-ff-ff-b6-b2-60
        Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI)
                ARICap: MFVC- ACS-, Next Function: 0
                ARICtl: MFVC- ACS-, Function Group: 0
        Capabilities: [160 v1] Single Root I/O Virtualization (SR-IOV)
                IOVCap: Migration-, Interrupt Message Number: 000
                IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy+
                IOVSta: Migration-
                Initial VFs: 64, Total VFs: 64, Number of VFs: 0, Function Dependency Link: 00
                VF offset: 128, stride: 2, Device ID: 10ed
                Supported Page Size: 00000553, System Page Size: 00000001
                Region 0: Memory at 00000000c7c00000 (64-bit, prefetchable)
                Region 3: Memory at 00000000c7b00000 (64-bit, prefetchable)
                VF Migration: offset: 00000000, BIR: 0
        Kernel driver in use: ixgbe




workaround for now is to do the same, as Brenden did in his original
finding: make sure that combined + xdp queues < max_tx_queues
(e.g. w/ combined == 14 the issue goes away).

> the issue with one of the sample XDP programs provided with the kernel
> such as the xdp2 which I believe uses the XDP_TX function. We need to
> try and create a similar setup in our own environment for
> reproduction and debugging.

will try but this could take a while, because i'm not sure that we have
ixgbe in our test lab (and it would be hard to run such test in prod)

> 
> Thanks.
> 
> - Alex

--
Nikita V. Shirokov

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ