lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 21 Aug 2018 11:13:49 -0700
From:   Alexander Duyck <alexander.duyck@...il.com>
To:     tehnerd@...nerd.com
Cc:     Netdev <netdev@...r.kernel.org>,
        Jeff Kirsher <jeffrey.t.kirsher@...el.com>
Subject: Re: ixgbe hangs when XDP_TX is enabled

On Tue, Aug 21, 2018 at 9:59 AM Nikita V. Shirokov <tehnerd@...nerd.com> wrote:
>
> On Tue, Aug 21, 2018 at 08:58:15AM -0700, Alexander Duyck wrote:
> > On Mon, Aug 20, 2018 at 12:32 PM Nikita V. Shirokov <tehnerd@...nerd.com> wrote:
> > >
> > > we are getting such errors:
> > >
> > > [  408.737313] ixgbe 0000:03:00.0 eth0: Detected Tx Unit Hang (XDP)
> > >                  Tx Queue             <46>
> > >                  TDH, TDT             <0>, <2>
> > >                  next_to_use          <2>
> > >                  next_to_clean        <0>
> > >                tx_buffer_info[next_to_clean]
> > >                  time_stamp           <0>
> > >                  jiffies              <1000197c0>
> > > [  408.804438] ixgbe 0000:03:00.0 eth0: tx hang 1 detected on queue 46, resetting adapter
> > > [  408.804440] ixgbe 0000:03:00.0 eth0: initiating reset due to tx timeout
> > > [  408.817679] ixgbe 0000:03:00.0 eth0: Reset adapter
> > > [  408.866091] ixgbe 0000:03:00.0 eth0: TXDCTL.ENABLE for one or more queues not cleared within the polling period
> > > [  409.345289] ixgbe 0000:03:00.0 eth0: detected SFP+: 3
> > > [  409.497232] ixgbe 0000:03:00.0 eth0: NIC Link is Up 10 Gbps, Flow Control: RX/TX
> > >
> > > while running XDP prog on ixgbe nic.
> > > right now i'm seing this on bpfnext kernel
> > > (latest commit from Wed Aug 15 15:04:25 2018 -0700 ;
> > > 9a76aba02a37718242d7cdc294f0a3901928aa57)
> > >
> > > looks like this is the same issue as reported by Brenden in
> > > https://www.spinics.net/lists/netdev/msg439438.html
> > >
> > > --
> > > Nikita V. Shirokov
> >
> > Could you provide some additional information about your setup.
> > Specifically useful would be "ethtool -i", "ethtool -l", and lspci
> > -vvv info for your device. The total number of CPUs on the system
> > would be useful to know as well. In addition could you try
> > reproducing
> sure:
>
> ethtool -l eth0
> Channel parameters for eth0:
> Pre-set maximums:
> RX:             0
> TX:             0
> Other:          1
> Combined:       63
> Current hardware settings:
> RX:             0
> TX:             0
> Other:          1
> Combined:       48
>
> # ethtool -i eth0
> driver: ixgbe
> version: 5.1.0-k
> firmware-version: 0x800006f1
> expansion-rom-version:
> bus-info: 0000:03:00.0
> supports-statistics: yes
> supports-test: yes
> supports-eeprom-access: yes
> supports-register-dump: yes
> supports-priv-flags: yes
>
>
> # nproc
> 48
>
> lspci:
>
> 03:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
>         Subsystem: Intel Corporation Device 000d
>         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Latency: 0, Cache Line Size: 32 bytes
>         Interrupt: pin A routed to IRQ 30
>         NUMA node: 0
>         Region 0: Memory at c7d00000 (64-bit, non-prefetchable) [size=1M]
>         Region 2: I/O ports at 6000 [size=32]
>         Region 4: Memory at c7e80000 (64-bit, non-prefetchable) [size=16K]
>         Expansion ROM at c7e00000 [disabled] [size=512K]
>         Capabilities: [40] Power Management version 3
>                 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
>                 Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
>         Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
>                 Address: 0000000000000000  Data: 0000
>                 Masking: 00000000  Pending: 00000000
>         Capabilities: [70] MSI-X: Enable+ Count=64 Masked-
>                 Vector table: BAR=4 offset=00000000
>                 PBA: BAR=4 offset=00002000
>         Capabilities: [a0] Express (v2) Endpoint, MSI 00
>                 DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
>                         ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
>                 DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
>                         RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
>                         MaxPayload 256 bytes, MaxReadReq 512 bytes
>                 DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend+
>                 LnkCap: Port #2, Speed 5GT/s, Width x8, ASPM L0s, Exit Latency L0s unlimited, L1 <8us
>                         ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
>                 LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
>                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>                 LnkSta: Speed 5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
>                 DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported
>                 DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
>                 LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
>                          Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
>                          Compliance De-emphasis: -6dB
>                 LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
>                          EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
>         Capabilities: [100 v1] Advanced Error Reporting
>                 UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>                 UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>                 UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>                 CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
>                 CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
>                 AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
>         Capabilities: [140 v1] Device Serial Number 90-e2-ba-ff-ff-b6-b2-60
>         Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI)
>                 ARICap: MFVC- ACS-, Next Function: 0
>                 ARICtl: MFVC- ACS-, Function Group: 0
>         Capabilities: [160 v1] Single Root I/O Virtualization (SR-IOV)
>                 IOVCap: Migration-, Interrupt Message Number: 000
>                 IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy+
>                 IOVSta: Migration-
>                 Initial VFs: 64, Total VFs: 64, Number of VFs: 0, Function Dependency Link: 00
>                 VF offset: 128, stride: 2, Device ID: 10ed
>                 Supported Page Size: 00000553, System Page Size: 00000001
>                 Region 0: Memory at 00000000c7c00000 (64-bit, prefetchable)
>                 Region 3: Memory at 00000000c7b00000 (64-bit, prefetchable)
>                 VF Migration: offset: 00000000, BIR: 0
>         Kernel driver in use: ixgbe
>
>
>
>
> workaround for now is to do the same, as Brenden did in his original
> finding: make sure that combined + xdp queues < max_tx_queues
> (e.g. w/ combined == 14 the issue goes away).
>
> > the issue with one of the sample XDP programs provided with the kernel
> > such as the xdp2 which I believe uses the XDP_TX function. We need to
> > try and create a similar setup in our own environment for
> > reproduction and debugging.
>
> will try but this could take a while, because i'm not sure that we have
> ixgbe in our test lab (and it would be hard to run such test in prod)
>
> >
> > Thanks.
> >
> > - Alex
>
> --
> Nikita V. Shirokov

So I have been reading the datasheet
(https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/82599-10-gbe-controller-datasheet.pdf)
and it looks like the assumption that Brenden came to in the earlier
referenced link is probably correct. From what I can tell there is a
limit of 64 queues in the base RSS mode of the device, so while it
supports more than 64 queues you can only make use of 64 as per table
7-25.

For now I think the workaround you are using is probably the only
viable solution. I myself don't have time to work on resolving this,
but I am sure on of the maintainers for ixgbe will be responding
shortly.

One possible solution we may want to look at would be to make use of
the 32 pool/VF mode in the MTQC register. That should enable us to
make use of all 128 queues but I am sure there would be other side
effects such as having to set the bits in the PFVFTE register in order
to enable the extra Tx queues.

Thanks.

- Alex

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ