netdev - Re: [bisected regression] e1000e: "Detected Hardware Unit Hang"

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1421256052.11734.22.camel@edumazet-glaptop2.roam.corp.google.com>
Date:	Wed, 14 Jan 2015 09:20:52 -0800
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Thomas Jarosch <thomas.jarosch@...ra2net.com>
Cc:	'Linux Netdev List' <netdev@...r.kernel.org>,
	Eric Dumazet <edumazet@...gle.com>,
	Jeff Kirsher <jeffrey.t.kirsher@...el.com>,
	e1000-devel <e1000-devel@...ts.sourceforge.net>
Subject: Re: [bisected regression] e1000e: "Detected Hardware Unit Hang"

On Wed, 2015-01-14 at 16:32 +0100, Thomas Jarosch wrote:
> Hello,
> 
> after updating a good bunch of production level machines
> from kernel 3.4.101 to kernel 3.14.25, a few of them started
> to show serious trouble when there was a lot of network traffic.
> 
> ---------------------------------------------------------------
> Jan 14 11:14:57 intrartc kernel: e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
> Jan 14 11:14:57 intrartc kernel:  TDH                  <3b>
> Jan 14 11:14:57 intrartc kernel:  TDT                  <76>
> Jan 14 11:14:57 intrartc kernel:  next_to_use          <76>
> Jan 14 11:14:57 intrartc kernel:  next_to_clean        <31>
> Jan 14 11:14:57 intrartc kernel: buffer_info[next_to_clean]:
> Jan 14 11:14:57 intrartc kernel:  time_stamp           <ffff328c>
> Jan 14 11:14:57 intrartc kernel:  next_to_watch        <3b>
> Jan 14 11:14:57 intrartc kernel:  jiffies              <ffff33b9>
> Jan 14 11:14:57 intrartc kernel:  next_to_watch.status <0>
> Jan 14 11:14:57 intrartc kernel: MAC Status             <40080083>
> Jan 14 11:14:57 intrartc kernel: PHY Status             <796d>
> Jan 14 11:14:57 intrartc kernel: PHY 1000BASE-T Status  <3800>
> Jan 14 11:14:57 intrartc kernel: PHY Extended Status    <3000>
> Jan 14 11:14:57 intrartc kernel: PCI Status             <10>
> Jan 14 11:14:59 intrartc kernel: e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
> ..
> ---------------------------------------------------------------
> 
> All of those troubled machines use an Intel DH61CR board and
> are driven by the e1000e driver. Kernels 3.7.0 to 3.19-rc4 are affected.
> 
> The problem vanishes when you disable TSO. This is the
> recommended "solution" on serverfault and others.
> http://ehc.ac/p/e1000/bugs/378/
> http://serverfault.com/questions/616485/e1000e-reset-adapter-unexpectedly-detected-hardware-unit-hang
> 
> I have a test setup that can trigger the problem within seconds
> and bisected it down to this commit (hi Eric!):
> ---------------------------------------------------------------
> commit 69b08f62e17439ee3d436faf0b9a7ca6fffb78db
> Author: Eric Dumazet <edumazet@...gle.com>
> Date:   Wed Sep 26 06:46:57 2012 +0000
> 
>     net: use bigger pages in __netdev_alloc_frag
> 
>     We currently use percpu order-0 pages in __netdev_alloc_frag
>     to deliver fragments used by __netdev_alloc_skb()
> 
>     Depending on NIC driver and arch being 32 or 64 bit, it allows a page to
>     be split in several fragments (between 1 and 8), assuming PAGE_SIZE=4096
> 
>     Switching to bigger pages (32768 bytes for PAGE_SIZE=4096 case) allows :
> 
>     - Better filling of space (the ending hole overhead is less an issue)
> 
>     - Less calls to page allocator or accesses to page->_count
> 
>     - Could allow struct skb_shared_info futures changes without major
>     performance impact.
> 
>     This patch implements a transparent fallback to smaller
>     pages in case of memory pressure.
> 
>     It also uses a standard "struct page_frag" instead of a custom one.
> 
>     Signed-off-by: Eric Dumazet <edumazet@...gle.com>
>     Cc: Alexander Duyck <alexander.h.duyck@...el.com>
>     Cc: Benjamin LaHaise <bcrl@...ck.org>
>     Signed-off-by: David S. Miller <davem@...emloft.net>
> ---------------------------------------------------------------
> 
> Reverting the commit f.e. in kernel 3.7.0  solves the issue.
> I've done some more tests:
> 
>     3.18.0 32bit + PAE: broken
>     3.6.0 32bit + PAE: works
>     3.7.0 32bit + PAE: broken
>     3.7.0 32bit + PAE + revert 69b08f62e17439ee3d436faf0b9a7ca6fffb78db -> works
> 
>     3.7.0 32bit (without PAE) -> broken
>     3.7.0 32bit + "GFP_COMP" flag removed in __netdev_alloc_frag(): broken
>     3.7.0 32bit + "GFP_COMP" flag replaced with
>                               "GFP_DMA" in __netdev_alloc_frag(): works!
>     3.7.0 32bit + "GFP_COMP" flag + "GFP_DMA" flag: broken
>     3.19-rc4 32bit: broken
> 
> 
> The problem is triggered only when the traffic is forwarded to another client.
> (this client is behind NAT). Generating traffic directly
> on the system did not trigger the issue.
> 
> To me it looks like Eric's change uncovered a memory allocation
> issue in the e1000e driver: It probably uses a memory address
> unsuitable for DMA or so. This is just a guess though.
> 
> Funny fact: I have another Intel DH61CR board that does not show the problem.
> I've borrowed (...) the mainboard from one affected box for my bisect test setup.
> 
> Please CC: comments. Thanks.

I would try to use lower data per txd. I am not sure 24KB is really
supported.

( check commit d821a4c4d11ad160925dab2bb009b8444beff484 for details)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index e14fd85f64eb..8d973f7edfbd 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -3897,7 +3897,7 @@ void e1000e_reset(struct e1000_adapter *adapter)
 	 * limit of 24KB due to receive synchronization limitations.
 	 */
 	adapter->tx_fifo_limit = min_t(u32, ((er32(PBA) >> 16) << 10) - 96,
-				       24 << 10);
+				       8 << 10);
 
 	/* Disable Adaptive Interrupt Moderation if 2 full packets cannot
 	 * fit in receive buffer.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html