netdev - Re: ixgbe hangs when XDP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20180824162538.2cb0e333@redhat.com>
Date:   Fri, 24 Aug 2018 16:25:48 +0200
From:   Jesper Dangaard Brouer <brouer@...hat.com>
To:     Jeff Kirsher <jeffrey.t.kirsher@...el.com>
Cc:     brouer@...hat.com, Alexander Duyck <alexander.duyck@...il.com>,
        tehnerd@...nerd.com, Netdev <netdev@...r.kernel.org>,
        tytus.a.wasilewski@...el.com,
        Tymoteusz Kielan <tymoteusz.kielan@...el.com>,
        John Fastabend <john.fastabend@...il.com>,
        Daniel Borkmann <borkmann@...earbox.net>,
        Alexei Starovoitov <alexei.starovoitov@...il.com>
Subject: Re: ixgbe hangs when XDP_TX is enabled


On Wed, 22 Aug 2018 09:22:58 -0700 Jeff Kirsher <jeffrey.t.kirsher@...el.com> wrote:
> On Tue, 2018-08-21 at 11:13 -0700, Alexander Duyck wrote:
> > On Tue, Aug 21, 2018 at 9:59 AM Nikita V. Shirokov <tehnerd@...nerd.com> wrote:
> > > 
> > > On Tue, Aug 21, 2018 at 08:58:15AM -0700, Alexander Duyck wrote:  
> > > > On Mon, Aug 20, 2018 at 12:32 PM Nikita V. Shirokov <tehnerd@...nerd.com> wrote:
> > > > > 
> > > > > we are getting such errors:
> > > > > 
> > > > > [  408.737313] ixgbe 0000:03:00.0 eth0: Detected Tx Unit Hang (XDP)
> > > > >                   Tx Queue             <46>
> > > > >                   TDH, TDT             <0>, <2>
> > > > >                   next_to_use          <2>
> > > > >                   next_to_clean        <0>
> > > > >                 tx_buffer_info[next_to_clean]
> > > > >                   time_stamp           <0>
> > > > >                   jiffies              <1000197c0>
> > > > > [  408.804438] ixgbe 0000:03:00.0 eth0: tx hang 1 detected on queue 46, resetting adapter
> > > > > [  408.804440] ixgbe 0000:03:00.0 eth0: initiating reset due to tx timeout
> > > > > [  408.817679] ixgbe 0000:03:00.0 eth0: Reset adapter
> > > > > [  408.866091] ixgbe 0000:03:00.0 eth0: TXDCTL.ENABLE for one or more queues not cleared within the polling period
> > > > > [  409.345289] ixgbe 0000:03:00.0 eth0: detected SFP+: 3
> > > > > [  409.497232] ixgbe 0000:03:00.0 eth0: NIC Link is Up 10 Gbps, Flow Control: RX/TX
> > > > > 
> > > > > while running XDP prog on ixgbe nic.
> > > > > right now i'm seing this on bpfnext kernel
> > > > > (latest commit from Wed Aug 15 15:04:25 2018 -0700 ;
> > > > > 9a76aba02a37718242d7cdc294f0a3901928aa57)
> > > > > 
> > > > > looks like this is the same issue as reported by Brenden in
> > > > > https://www.spinics.net/lists/netdev/msg439438.html
> > > > > 
> > > > [...] The total number of CPUs on the system
> > > > would be useful to know as well.
[...]
> > > # nproc
> > > 48
> > > 
[...]
> > > ethtool -l eth0
> > > Channel parameters for eth0:
> > > Pre-set maximums:
> > > RX:             0
> > > TX:             0
> > > Other:          1
> > > Combined:       63
> > > Current hardware settings:
> > > RX:             0
> > > TX:             0
> > > Other:          1
> > > Combined:       48
[...]

> > > 
> > > workaround for now is to do the same, as Brenden did in his
> > > original
> > > finding: make sure that combined + xdp queues < max_tx_queues
> > > (e.g. w/ combined == 14 the issue goes away).
> > >   
> > > > the issue with one of the sample XDP programs provided with the
> > > > kernel such as the xdp2 which I believe uses the XDP_TX
> > > > function. We need to try and create a similar setup in our own
> > > > environment for reproduction and debugging.  
> > > 
> > > will try but this could take a while, because i'm not sure that we
> > > have ixgbe in our test lab (and it would be hard to run such test
> > > in prod)

Notice to reproduce you need a system with 48 cores. (I predict: for
less than 33 cores it will not show, and above 48 cores the XDP prog
should be rejected loading).


> > 
> > So I have been reading the datasheet
> > (
> > https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/82599-10-gbe-controller-datasheet.pdf
> > )
> > and it looks like the assumption that Brenden came to in the earlier
> > referenced link is probably correct. From what I can tell there is a
> > limit of 64 queues in the base RSS mode of the device, so while it
> > supports more than 64 queues you can only make use of 64 as per table
> > 7-25.
> > 

As far as I can remember, the driver code assumes up-to 96 queue are
avail.  It sounds like the driver XDP code that allocates 'one XDP
TX-queue per core' in the system are causing this.

I have previously complained that the ixgbe driver will be able to
enable XDP on machines with many CPU cores, due to the 'one XDP TX-queue
per core' design requirement.


> > For now I think the workaround you are using is probably the only
> > viable solution. I myself don't have time to work on resolving this,
> > but I am sure on of the maintainers for ixgbe will be responding
> > shortly.  
> 
> I have notified the 10GbE maintainers, and we are working to reproduce
> the issue currently.

For reproducers, notice the correlation with the number of cores the
system have.

 
> > One possible solution we may want to look at would be to make use of
> > the 32 pool/VF mode in the MTQC register. That should enable us to
> > make use of all 128 queues but I am sure there would be other side
> > effects such as having to set the bits in the PFVFTE register in
> > order to enable the extra Tx queues.  
 
Getting access to more queue is of-cause good, as it move the bar for
how many cores a system can have before XDP will no-longer work with
the ixgbe driver.

An alternative solution is also possible, but there will be a
performance trade off.  After merge commit 10f678683e4 ("Merge branch
'xdp_xmit-bulking'") the ndo_xdp_xmit() gets bulks of 16 frames (limit
within devmap).  Thus, on systems that cannot allocate a NIC HW queue
foreach CPU core, can alternatively use locked XDP TX queue(s) (which
will be amortized due to bulking).  This mode will be slower, thus the
question is how do we "warn" the user, that this will be operating in a
slightly less optimal XDP-TX mode? (will a simple pr_info be enough,
like when there is insufficient PCIe BW).

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer