[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180821165858.GA1507@maindev>
Date: Tue, 21 Aug 2018 09:58:58 -0700
From: "Nikita V. Shirokov" <tehnerd@...nerd.com>
To: Alexander Duyck <alexander.duyck@...il.com>
Cc: netdev@...r.kernel.org, jeffrey.t.kirsher@...el.com
Subject: Re: ixgbe hangs when XDP_TX is enabled
On Tue, Aug 21, 2018 at 08:58:15AM -0700, Alexander Duyck wrote:
> On Mon, Aug 20, 2018 at 12:32 PM Nikita V. Shirokov <tehnerd@...nerd.com> wrote:
> >
> > we are getting such errors:
> >
> > [ 408.737313] ixgbe 0000:03:00.0 eth0: Detected Tx Unit Hang (XDP)
> > Tx Queue <46>
> > TDH, TDT <0>, <2>
> > next_to_use <2>
> > next_to_clean <0>
> > tx_buffer_info[next_to_clean]
> > time_stamp <0>
> > jiffies <1000197c0>
> > [ 408.804438] ixgbe 0000:03:00.0 eth0: tx hang 1 detected on queue 46, resetting adapter
> > [ 408.804440] ixgbe 0000:03:00.0 eth0: initiating reset due to tx timeout
> > [ 408.817679] ixgbe 0000:03:00.0 eth0: Reset adapter
> > [ 408.866091] ixgbe 0000:03:00.0 eth0: TXDCTL.ENABLE for one or more queues not cleared within the polling period
> > [ 409.345289] ixgbe 0000:03:00.0 eth0: detected SFP+: 3
> > [ 409.497232] ixgbe 0000:03:00.0 eth0: NIC Link is Up 10 Gbps, Flow Control: RX/TX
> >
> > while running XDP prog on ixgbe nic.
> > right now i'm seing this on bpfnext kernel
> > (latest commit from Wed Aug 15 15:04:25 2018 -0700 ;
> > 9a76aba02a37718242d7cdc294f0a3901928aa57)
> >
> > looks like this is the same issue as reported by Brenden in
> > https://www.spinics.net/lists/netdev/msg439438.html
> >
> > --
> > Nikita V. Shirokov
>
> Could you provide some additional information about your setup.
> Specifically useful would be "ethtool -i", "ethtool -l", and lspci
> -vvv info for your device. The total number of CPUs on the system
> would be useful to know as well. In addition could you try
> reproducing
sure:
ethtool -l eth0
Channel parameters for eth0:
Pre-set maximums:
RX: 0
TX: 0
Other: 1
Combined: 63
Current hardware settings:
RX: 0
TX: 0
Other: 1
Combined: 48
# ethtool -i eth0
driver: ixgbe
version: 5.1.0-k
firmware-version: 0x800006f1
expansion-rom-version:
bus-info: 0000:03:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes
# nproc
48
lspci:
03:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
Subsystem: Intel Corporation Device 000d
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 32 bytes
Interrupt: pin A routed to IRQ 30
NUMA node: 0
Region 0: Memory at c7d00000 (64-bit, non-prefetchable) [size=1M]
Region 2: I/O ports at 6000 [size=32]
Region 4: Memory at c7e80000 (64-bit, non-prefetchable) [size=16K]
Expansion ROM at c7e00000 [disabled] [size=512K]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
Address: 0000000000000000 Data: 0000
Masking: 00000000 Pending: 00000000
Capabilities: [70] MSI-X: Enable+ Count=64 Masked-
Vector table: BAR=4 offset=00000000
PBA: BAR=4 offset=00002000
Capabilities: [a0] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend+
LnkCap: Port #2, Speed 5GT/s, Width x8, ASPM L0s, Exit Latency L0s unlimited, L1 <8us
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
Capabilities: [140 v1] Device Serial Number 90-e2-ba-ff-ff-b6-b2-60
Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI)
ARICap: MFVC- ACS-, Next Function: 0
ARICtl: MFVC- ACS-, Function Group: 0
Capabilities: [160 v1] Single Root I/O Virtualization (SR-IOV)
IOVCap: Migration-, Interrupt Message Number: 000
IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy+
IOVSta: Migration-
Initial VFs: 64, Total VFs: 64, Number of VFs: 0, Function Dependency Link: 00
VF offset: 128, stride: 2, Device ID: 10ed
Supported Page Size: 00000553, System Page Size: 00000001
Region 0: Memory at 00000000c7c00000 (64-bit, prefetchable)
Region 3: Memory at 00000000c7b00000 (64-bit, prefetchable)
VF Migration: offset: 00000000, BIR: 0
Kernel driver in use: ixgbe
workaround for now is to do the same, as Brenden did in his original
finding: make sure that combined + xdp queues < max_tx_queues
(e.g. w/ combined == 14 the issue goes away).
> the issue with one of the sample XDP programs provided with the kernel
> such as the xdp2 which I believe uses the XDP_TX function. We need to
> try and create a similar setup in our own environment for
> reproduction and debugging.
will try but this could take a while, because i'm not sure that we have
ixgbe in our test lab (and it would be hard to run such test in prod)
>
> Thanks.
>
> - Alex
--
Nikita V. Shirokov
Powered by blists - more mailing lists