lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a8426bc8-aaba-6c07-53d5-ff65db6984f6@gmail.com>
Date:   Sun, 17 Mar 2019 12:35:45 +0100
From:   Heiner Kallweit <hkallweit1@...il.com>
To:     VDR User <user.vdr@...il.com>
Cc:     Alexander Duyck <alexander.h.duyck@...ux.intel.com>,
        netdev@...r.kernel.org
Subject: Re: r8169 driver from kernel 5.0 crashing - napi_consume_skb

On 16.03.2019 15:38, VDR User wrote:
>> Part of the issue though is that we don't know how reliable that test
>> was. I believe Derek he hasn't had any crashes, but he wasn't confident
>> that it had actually resolved the issue.
> 
> Previously I thought I could easily & consistently reproduce the crash
> but the more testing I did, the more I realized that wasn't the case.
> That's why my confidence was low in that reversing commit 5317d5c6d47e
> ("r8169: use napi_consume_skb where possible") fixed it. I felt like I
> needed to do a lot more testing over the weekend to be sure. But, I
> can now confirm that reversing that commit did not solve the problem.
> I didn't ifdown/ifup after the crash so the nic eventually recovered
> on its own I guess. The `ethtool -S` output is:
> 
> NIC statistics:
>      tx_packets: 5370650
>      rx_packets: 57340787
>      tx_errors: 0
>      rx_errors: 0
>      rx_missed: 26
>      align_errors: 0
>      tx_single_collisions: 0
>      tx_multi_collisions: 0
>      unicast: 57332905
>      broadcast: 6409
>      multicast: 1473
>      tx_aborted: 0
>      tx_underrun: 0
> 
[...]
> 
> Please let me know if there's anything I can do to help.
> Derek
> 
Below are two patches. The first removes an extra PCI register read in the
interrupt handler, the second one just adds some tracing for debugging.

Interesting would be whether patch 1 has an impact on the issue, and the
trace output of patch 2 after the issue occurred (w/ or w/o patch 1).
Trace output you find in /sys/kernel/debug/tracing/trace

Heiner

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 761097710..46a4dc888 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -679,6 +679,7 @@ struct rtl8169_private {
 		struct work_struct work;
 	} wk;
 
+	unsigned irq_enabled:1;
 	unsigned supports_gmii:1;
 	dma_addr_t counters_phys_addr;
 	struct rtl8169_counters *counters;
@@ -1294,6 +1295,7 @@ static void rtl_ack_events(struct rtl8169_private *tp, u16 bits)
 static void rtl_irq_disable(struct rtl8169_private *tp)
 {
 	RTL_W16(tp, IntrMask, 0);
+	tp->irq_enabled = 0;
 }
 
 #define RTL_EVENT_NAPI_RX	(RxOK | RxErr)
@@ -1302,6 +1304,7 @@ static void rtl_irq_disable(struct rtl8169_private *tp)
 
 static void rtl_irq_enable(struct rtl8169_private *tp)
 {
+	tp->irq_enabled = 1;
 	RTL_W16(tp, IntrMask, tp->irq_mask);
 }
 
@@ -6521,9 +6524,8 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
 {
 	struct rtl8169_private *tp = dev_instance;
 	u16 status = RTL_R16(tp, IntrStatus);
-	u16 irq_mask = RTL_R16(tp, IntrMask);
 
-	if (status == 0xffff || !(status & irq_mask))
+	if (!tp->irq_enabled || status == 0xffff || !(status & tp->irq_mask))
 		return IRQ_NONE;
 
 	if (unlikely(status & SYSErr)) {
-- 
2.21.0







diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 46a4dc888..5a40fa6f8 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -6258,6 +6258,7 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
 		 * not miss a ring update when it notices a stopped queue.
 		 */
 		smp_wmb();
+		trace_printk("stopping tx queue\n");
 		netif_stop_queue(dev);
 		/* Sync with rtl_tx:
 		 * - publish queue status and cur_tx ring index (write barrier)
@@ -6267,8 +6268,10 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
 		 * can't.
 		 */
 		smp_mb();
-		if (rtl_tx_slots_avail(tp, MAX_SKB_FRAGS))
+		if (rtl_tx_slots_avail(tp, MAX_SKB_FRAGS)) {
+			trace_printk("waking tx queue\n");
 			netif_wake_queue(dev);
+		}
 	}
 
 	return NETDEV_TX_OK;
@@ -6376,6 +6379,7 @@ static void rtl_tx(struct net_device *dev, struct rtl8169_private *tp,
 		smp_mb();
 		if (netif_queue_stopped(dev) &&
 		    rtl_tx_slots_avail(tp, MAX_SKB_FRAGS)) {
+			trace_printk("waking tx queue\n");
 			netif_wake_queue(dev);
 		}
 		/*
-- 
2.21.0

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ