lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e9762fc08e0c62c52ee5169e43e512f5d65087a2.camel@intel.com>
Date:   Wed, 22 Aug 2018 09:22:58 -0700
From:   Jeff Kirsher <jeffrey.t.kirsher@...el.com>
To:     Alexander Duyck <alexander.duyck@...il.com>, tehnerd@...nerd.com
Cc:     Netdev <netdev@...r.kernel.org>, tytus.a.wasilewski@...el.com,
        Tymoteusz Kielan <tymoteusz.kielan@...el.com>
Subject: Re: ixgbe hangs when XDP_TX is enabled

On Tue, 2018-08-21 at 11:13 -0700, Alexander Duyck wrote:
> On Tue, Aug 21, 2018 at 9:59 AM Nikita V. Shirokov <
> tehnerd@...nerd.com> wrote:
> > 
> > On Tue, Aug 21, 2018 at 08:58:15AM -0700, Alexander Duyck wrote:
> > > On Mon, Aug 20, 2018 at 12:32 PM Nikita V. Shirokov <
> > > tehnerd@...nerd.com> wrote:
> > > > 
> > > > we are getting such errors:
> > > > 
> > > > [  408.737313] ixgbe 0000:03:00.0 eth0: Detected Tx Unit Hang
> > > > (XDP)
> > > >                   Tx Queue             <46>
> > > >                   TDH, TDT             <0>, <2>
> > > >                   next_to_use          <2>
> > > >                   next_to_clean        <0>
> > > >                 tx_buffer_info[next_to_clean]
> > > >                   time_stamp           <0>
> > > >                   jiffies              <1000197c0>
> > > > [  408.804438] ixgbe 0000:03:00.0 eth0: tx hang 1 detected on
> > > > queue 46, resetting adapter
> > > > [  408.804440] ixgbe 0000:03:00.0 eth0: initiating reset due to
> > > > tx timeout
> > > > [  408.817679] ixgbe 0000:03:00.0 eth0: Reset adapter
> > > > [  408.866091] ixgbe 0000:03:00.0 eth0: TXDCTL.ENABLE for one
> > > > or more queues not cleared within the polling period
> > > > [  409.345289] ixgbe 0000:03:00.0 eth0: detected SFP+: 3
> > > > [  409.497232] ixgbe 0000:03:00.0 eth0: NIC Link is Up 10 Gbps,
> > > > Flow Control: RX/TX
> > > > 
> > > > while running XDP prog on ixgbe nic.
> > > > right now i'm seing this on bpfnext kernel
> > > > (latest commit from Wed Aug 15 15:04:25 2018 -0700 ;
> > > > 9a76aba02a37718242d7cdc294f0a3901928aa57)
> > > > 
> > > > looks like this is the same issue as reported by Brenden in
> > > > https://www.spinics.net/lists/netdev/msg439438.html
> > > > 
> > > > --
> > > > Nikita V. Shirokov
> > > 
> > > Could you provide some additional information about your setup.
> > > Specifically useful would be "ethtool -i", "ethtool -l", and
> > > lspci
> > > -vvv info for your device. The total number of CPUs on the system
> > > would be useful to know as well. In addition could you try
> > > reproducing
> > 
> > sure:
> > 
> > ethtool -l eth0
> > Channel parameters for eth0:
> > Pre-set maximums:
> > RX:             0
> > TX:             0
> > Other:          1
> > Combined:       63
> > Current hardware settings:
> > RX:             0
> > TX:             0
> > Other:          1
> > Combined:       48
> > 
> > # ethtool -i eth0
> > driver: ixgbe
> > version: 5.1.0-k
> > firmware-version: 0x800006f1
> > expansion-rom-version:
> > bus-info: 0000:03:00.0
> > supports-statistics: yes
> > supports-test: yes
> > supports-eeprom-access: yes
> > supports-register-dump: yes
> > supports-priv-flags: yes
> > 
> > 
> > # nproc
> > 48
> > 
> > lspci:
> > 
> > 03:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> > SFI/SFP+ Network Connection (rev 01)
> >          Subsystem: Intel Corporation Device 000d
> >          Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV-
> > VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
> >          Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast
> > >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> >          Latency: 0, Cache Line Size: 32 bytes
> >          Interrupt: pin A routed to IRQ 30
> >          NUMA node: 0
> >          Region 0: Memory at c7d00000 (64-bit, non-prefetchable)
> > [size=1M]
> >          Region 2: I/O ports at 6000 [size=32]
> >          Region 4: Memory at c7e80000 (64-bit, non-prefetchable)
> > [size=16K]
> >          Expansion ROM at c7e00000 [disabled] [size=512K]
> >          Capabilities: [40] Power Management version 3
> >                  Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
> > PME(D0+,D1-,D2-,D3hot+,D3cold+)
> >                  Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1
> > PME-
> >          Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
> >                  Address: 0000000000000000  Data: 0000
> >                  Masking: 00000000  Pending: 00000000
> >          Capabilities: [70] MSI-X: Enable+ Count=64 Masked-
> >                  Vector table: BAR=4 offset=00000000
> >                  PBA: BAR=4 offset=00002000
> >          Capabilities: [a0] Express (v2) Endpoint, MSI 00
> >                  DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency
> > L0s <512ns, L1 <64us
> >                          ExtTag- AttnBtn- AttnInd- PwrInd- RBE+
> > FLReset+ SlotPowerLimit 0.000W
> >                  DevCtl: Report errors: Correctable+ Non-Fatal+
> > Fatal+ Unsupported+
> >                          RlxdOrd- ExtTag- PhantFunc- AuxPwr-
> > NoSnoop+ FLReset-
> >                          MaxPayload 256 bytes, MaxReadReq 512 bytes
> >                  DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+
> > AuxPwr+ TransPend+
> >                  LnkCap: Port #2, Speed 5GT/s, Width x8, ASPM L0s,
> > Exit Latency L0s unlimited, L1 <8us
> >                          ClockPM- Surprise- LLActRep- BwNot-
> > ASPMOptComp-
> >                  LnkCtl: ASPM Disabled; RCB 64 bytes Disabled-
> > CommClk+
> >                          ExtSynch- ClockPM- AutWidDis- BWInt-
> > AutBWInt-
> >                  LnkSta: Speed 5GT/s, Width x8, TrErr- Train-
> > SlotClk+ DLActive- BWMgmt- ABWMgmt-
> >                  DevCap2: Completion Timeout: Range ABCD,
> > TimeoutDis+, LTR-, OBFF Not Supported
> >                  DevCtl2: Completion Timeout: 50us to 50ms,
> > TimeoutDis-, LTR-, OBFF Disabled
> >                  LnkCtl2: Target Link Speed: 5GT/s,
> > EnterCompliance- SpeedDis-
> >                           Transmit Margin: Normal Operating Range,
> > EnterModifiedCompliance- ComplianceSOS-
> >                           Compliance De-emphasis: -6dB
> >                  LnkSta2: Current De-emphasis Level: -6dB,
> > EqualizationComplete-, EqualizationPhase1-
> >                           EqualizationPhase2-, EqualizationPhase3-, 
> > LinkEqualizationRequest-
> >          Capabilities: [100 v1] Advanced Error Reporting
> >                  UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt-
> > UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> >                  UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt-
> > UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> >                  UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt-
> > UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> >                  CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout-
> > NonFatalErr+
> >                  CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout-
> > NonFatalErr+
> >                  AERCap: First Error Pointer: 00, GenCap+ CGenEn-
> > ChkCap+ ChkEn-
> >          Capabilities: [140 v1] Device Serial Number 90-e2-ba-ff-
> > ff-b6-b2-60
> >          Capabilities: [150 v1] Alternative Routing-ID
> > Interpretation (ARI)
> >                  ARICap: MFVC- ACS-, Next Function: 0
> >                  ARICtl: MFVC- ACS-, Function Group: 0
> >          Capabilities: [160 v1] Single Root I/O Virtualization (SR-
> > IOV)
> >                  IOVCap: Migration-, Interrupt Message Number: 000
> >                  IOVCtl: Enable- Migration- Interrupt- MSE-
> > ARIHierarchy+
> >                  IOVSta: Migration-
> >                  Initial VFs: 64, Total VFs: 64, Number of VFs: 0,
> > Function Dependency Link: 00
> >                  VF offset: 128, stride: 2, Device ID: 10ed
> >                  Supported Page Size: 00000553, System Page Size:
> > 00000001
> >                  Region 0: Memory at 00000000c7c00000 (64-bit,
> > prefetchable)
> >                  Region 3: Memory at 00000000c7b00000 (64-bit,
> > prefetchable)
> >                  VF Migration: offset: 00000000, BIR: 0
> >          Kernel driver in use: ixgbe
> > 
> > 
> > 
> > 
> > workaround for now is to do the same, as Brenden did in his
> > original
> > finding: make sure that combined + xdp queues < max_tx_queues
> > (e.g. w/ combined == 14 the issue goes away).
> > 
> > > the issue with one of the sample XDP programs provided with the
> > > kernel
> > > such as the xdp2 which I believe uses the XDP_TX function. We
> > > need to
> > > try and create a similar setup in our own environment for
> > > reproduction and debugging.
> > 
> > will try but this could take a while, because i'm not sure that we
> > have
> > ixgbe in our test lab (and it would be hard to run such test in
> > prod)
> > 
> > > 
> > > Thanks.
> > > 
> > > - Alex
> > 
> > --
> > Nikita V. Shirokov
> 
> So I have been reading the datasheet
> (
> https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/82599-10-gbe-controller-datasheet.pdf
> )
> and it looks like the assumption that Brenden came to in the earlier
> referenced link is probably correct. From what I can tell there is a
> limit of 64 queues in the base RSS mode of the device, so while it
> supports more than 64 queues you can only make use of 64 as per table
> 7-25.
> 
> For now I think the workaround you are using is probably the only
> viable solution. I myself don't have time to work on resolving this,
> but I am sure on of the maintainers for ixgbe will be responding
> shortly.

I have notified the 10GbE maintainers, and we are working to reproduce
the issue currently.

> 
> One possible solution we may want to look at would be to make use of
> the 32 pool/VF mode in the MTQC register. That should enable us to
> make use of all 128 queues but I am sure there would be other side
> effects such as having to set the bits in the PFVFTE register in
> order
> to enable the extra Tx queues.


Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ