netdev - RE: [PATCH RFC 24/30] net: fec: better implementation of iMX6 ERR006358 quirk

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sun, 22 Jun 2014 08:38:49 +0000
From:	"fugang.duan@...escale.com" <fugang.duan@...escale.com>
To:	Russell King - ARM Linux <linux@....linux.org.uk>
CC:	"linux-arm-kernel@...ts.infradead.org" 
	<linux-arm-kernel@...ts.infradead.org>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: RE: [PATCH RFC 24/30] net: fec: better implementation of iMX6
	ERR006358 quirk

From: Russell King - ARM Linux <linux@....linux.org.uk> Data: Sunday, June 22, 2014 4:13 PM
>To: Duan Fugang-B38611
>Cc: linux-arm-kernel@...ts.infradead.org; netdev@...r.kernel.org
>Subject: Re: [PATCH RFC 24/30] net: fec: better implementation of iMX6
>ERR006358 quirk
>
>On Sun, Jun 22, 2014 at 07:49:11AM +0000, fugang.duan@...escale.com wrote:
>> From: Russell King <rmk@....linux.org.uk> Data: Friday, June 20, 2014
>> 8:14 PM
>> >To: linux-arm-kernel@...ts.infradead.org
>> >Cc: Duan Fugang-B38611; netdev@...r.kernel.org
>> >Subject: [PATCH RFC 24/30] net: fec: better implementation of iMX6
>> >ERR006358 quirk
>> >
>> >Using a (delayed) workqueue for ERR006358 is not correct - a work
>> >queue is a single-trigger device.  Once the work queue has been
>> >scheduled, it can't be re-scheduled until it has been run.  This can
>> >cause problems - with an appropriate packet timing, we can end up
>> >with packets queued, but not sent by the hardware, resulting in the
>transmit timeout firing.
>> >
>> >Re-implement this as per the workaround detailed in the ERR006358
>> >documentation - if there are packets waiting to be sent when we
>> >service the transmit ring, and we see that the transmitter is not
>> >running, kick the transmitter to run the pending entries in the ring.
>> >
>> >Testing here with a 10Mbit half duplex link sees the resulting iperf
>> >TCP bandwidth increase from between 1 to 2Mbps to between 8 to 9Mbps.
>> >
>> >Signed-off-by: Russell King <rmk+kernel@....linux.org.uk>
>> >---
>> > drivers/net/ethernet/freescale/fec.h      |  1 -
>> > drivers/net/ethernet/freescale/fec_main.c | 31
>> >+++++---------------------
>> >-----
>> > 2 files changed, 5 insertions(+), 27 deletions(-)
>> >
>> >diff --git a/drivers/net/ethernet/freescale/fec.h
>> >b/drivers/net/ethernet/freescale/fec.h
>> >index 96d2a18f1b99..17e294970207 100644
>> >--- a/drivers/net/ethernet/freescale/fec.h
>> >+++ b/drivers/net/ethernet/freescale/fec.h
>> >@@ -259,7 +259,6 @@ struct bufdesc_ex {  struct fec_enet_delayed_work
>> >{
>> > 	struct delayed_work delay_work;
>> > 	bool timeout;
>> >-	bool trig_tx;
>> > };
>> >
>> > /* The FEC buffer descriptors track the ring buffers.  The
>> >rx_bd_base and diff --git a/drivers/net/ethernet/freescale/fec_main.c
>> >b/drivers/net/ethernet/freescale/fec_main.c
>> >index 62120e47ef5a..957bb98add5a 100644
>> >--- a/drivers/net/ethernet/freescale/fec_main.c
>> >+++ b/drivers/net/ethernet/freescale/fec_main.c
>> >@@ -342,22 +342,6 @@ fec_enet_clear_csum(struct sk_buff *skb, struct
>> >net_device *ndev)
>> > 	return 0;
>> > }
>> >
>> >-static void
>> >-fec_enet_submit_work(struct bufdesc *bdp, struct fec_enet_private
>> >*fep) - {
>> >-	const struct platform_device_id *id_entry =
>> >-				platform_get_device_id(fep->pdev);
>> >-	struct bufdesc *bdp_pre;
>> >-
>> >-	bdp_pre = fec_enet_get_prevdesc(bdp, fep);
>> >-	if ((id_entry->driver_data & FEC_QUIRK_ERR006358) &&
>> >-	    !(bdp_pre->cbd_sc & BD_ENET_TX_READY)) {
>> >-		fep->delay_work.trig_tx = true;
>> >-		schedule_delayed_work(&(fep->delay_work.delay_work),
>> >-					msecs_to_jiffies(1));
>> >-	}
>> >-}
>> >-
>> > static int
>> > fec_enet_txq_submit_frag_skb(struct sk_buff *skb, struct net_device
>> >*ndev) { @@ -545,8 +529,6 @@ static int
>> >fec_enet_txq_submit_skb(struct sk_buff *skb, struct net_device *ndev)
>> > 	status |= (BD_ENET_TX_READY | BD_ENET_TX_TC);
>> > 	bdp->cbd_sc = status;
>> >
>> >-	fec_enet_submit_work(bdp, fep);
>> >-
>> > 	/* If this was the last BD in the ring, start at the beginning again.
>> >*/
>> > 	bdp = fec_enet_get_nextdesc(last_bdp, fep);
>> >
>> >@@ -735,8 +717,6 @@ static int fec_enet_txq_submit_tso(struct sk_buff
>> >*skb, struct net_device *ndev)
>> > 	/* Save skb pointer */
>> > 	fep->tx_skbuff[index] = skb;
>> >
>> >-	fec_enet_submit_work(bdp, fep);
>> >-
>> > 	skb_tx_timestamp(skb);
>> > 	fep->cur_tx = bdp;
>> >
>> >@@ -1065,11 +1045,6 @@ static void fec_enet_work(struct work_struct
>*work)
>> > 		}
>> > 		rtnl_unlock();
>> > 	}
>> >-
>> >-	if (fep->delay_work.trig_tx) {
>> >-		fep->delay_work.trig_tx = false;
>> >-		writel(0, fep->hwp + FEC_X_DES_ACTIVE);
>> >-	}
>> > }
>> >
>> > static void
>> >@@ -1166,7 +1141,11 @@ fec_enet_tx(struct net_device *ndev)
>> > 				netif_wake_queue(ndev);
>> > 		}
>> > 	}
>> >-	return;
>> >+
>> >+	/* ERR006538: Keep the transmitter going */
>> >+	if (fep->dirty_tx != fep->cur_tx &&
>> >+	    readl(fep->hwp + FEC_X_DES_ACTIVE) == 0)
>> >+		writel(0, fep->hwp + FEC_X_DES_ACTIVE);
>> > }
>> >
>> If fep->dirty_tx == fep->cur_tx, which means tx bd entry is empty, it
>> doesn't need to trigger TDAR.
>
>I think you're wrong.  Upon initialisation (which is the empty case
>because at initialisation time, the tx ring will be empty), fep->cur_tx is
>set to the first tx bd entry, and fep->dirty_tx is set to the /preceding/
>entry.
>
>When a packet is loaded to be transmitted, it is placed in the fep->cur_tx
>slot.  In other words, fep->cur_tx points at the tx bd entry which takes
>the _next_ packet to be loaded into the ring.
>
>When packets are reaped, the first thing the reaping function does after
>reading fep->dirty_tx is increment to the next ring entry.
>
>So, the "ring full" case is fep->dirty_tx == fep->cur_tx, and the "ring
>empty" case is next(fep->dirty_tx) == fep->cur_tx.
>
>You've made this mistake in the SG patch, where you miscalculate the ring
>free size when fep->dirty_tx == fep->cur_tx, indicating that the ring is
>entirely free, when in fact it has no free entries.
>
Agree. Sorry, my mistake.

>> The issue is the last tx frame cannot be trasmited by uDMA since uDMA
>> had cleared the TDAR after user write TDAR.
>> User					uDMA
>> 					Process the previous frame, detect the next
>BD ready bit,
>> 					The next BD ready bit is not set
>> Set the next BD ready bit
>> Set TDAR
>> 					Since the next BD ready bit detected as not
>set, start to clear TDAR.
>
>Yes, I'm well aware of the mechanism that causes this problem.
>
>> The issue has happened customer product, and was fixed by previous
>> workaround.  The issue only happened at little packets transmission.
>> Special for command communicate protocol to the other end.
>
>So, please explain to me why, with the current solution, I get:
>
>1. half duplex performance is abismal, to the point that the ethernet
>   only achieves about 12-25% of the bandwidth, and jumps massively
>   using the solution in this patch (which is as per the workaround
>   documented in the errata document.)
>
My understand is that you run tcp test bandwidth, so you may catch below cases:
Transmit Packet 1, 2, 3,.... n,  wait tcp ack from the other end, and then transmit packet n+1, n+2,.....
uDMA bandwidth is enough fast for 10Mbps, maybe the number "n" packet is not transmited since the errata ERR006358.
So there introduce much latency for transmit flow.

So I guess that if you use UDP bandwidth, maybe there have no diff with your patch.

>2. transmit timeouts when operating in half-duplex mode.
>
>> If you move the extra trigger to tx clean function, you must check:
>> bdp_pre = fec_enet_get_prevdesc(fep->cur_tx, fep); bdp_next =
>> fec_enet_get_nextdesc(fep->dirty_tx, fep); if (bdp_next == bdp_pre &&
>> 	bdp_pre->cbd_sc & BD_ENET_TX_READY &&
>> 	readl(fep->hwp + FEC_X_DES_ACTIVE) == 0)
>> 	writel(0, fep->hwp + FEC_X_DES_ACTIVE);
>
>While I agree that we can read back to check whether the device indicates
>that transmit is active, there's no point to the other tests.  If there
>are entries in the transmit ring but the transmitter indicates that it is
>not active, then it is obvious that the bug has been hit.  This is exactly
>what my implementation above does.
>
The condition "fep->dirty_tx != fep->cur_tx" is not only limited for the errata. 
I mean only add extra trigger TDAR for the issue.

>--
>FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
>improving, and getting towards what was expected from it.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html