[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20200419204722.GQ836632@lunn.ch>
Date: Sun, 19 Apr 2020 22:47:22 +0200
From: Andrew Lunn <andrew@...n.ch>
To: Chris Healy <cphealy@...il.com>
Cc: Andy Duan <fugang.duan@....com>,
David Miller <davem@...emloft.net>,
netdev <netdev@...r.kernel.org>,
Florian Fainelli <f.fainelli@...il.com>,
Heiner Kallweit <hkallweit1@...il.com>
Subject: Re: [EXT] [PATCH net-next v2 1/3] net: ethernet: fec: Replace
interrupt driven MDIO with polled IO
On Sat, Apr 18, 2020 at 03:39:17PM -0700, Chris Healy wrote:
> I did some profiling using an oscilloscope with my NXP Vybrid based
> platform to see what different "sleep_us" values resulted in for start
> of MDIO to start of MDIO transaction times. Here's what I found:
>
> 0 - ~38us to ~40us
> 1 - ~48us to ~64us
> 2 - ~48us to ~64us
> 3 - ~48us to ~64us
> 4 - ~48us to ~64us
> 4 - ~48us to ~64us
> 5 - ~48us to ~64us
> 6 - ~48us to ~64us
> 7 - ~48us to ~64us
> 8 - ~48us to ~64us
> 9 - ~56us to ~88us
> 10 - ~56us to ~112us
>
> Basically, with the "sleep_us" value set to 0, I would get the
> shortest inter transaction times with a very low variance. Once I
> went to a non-zero value, the inter transaction time went up, as well
> as the variance, which I suppose makes sense....
Hi All
I dug into this, adding some instrumentation. I've been testing on a
Vybrid platform. During boot it does over 6500 transactions in order
to configure 3 switches and one PHY.
With a delay of 0, it polls an average of 62 times, and it needs 29us
to exit the loop. This means one poll takes 0.5uS.
With a delay of 1uS, it polls on average 2 times, and takes on average
45uS to exit the loop. Which suggests the delay takes around 22uS,
despite the request for only 1uS. Looking at the code, it is actually
performing a usleep_range(1, 1). So i'm assuming sleeping is very
expensive, maybe it is using a timer? So we end up with just as much
interrupt work as waiting for the MDIO interrupt?
By swapping to readl_poll_timeout_atomic() i got much better
behaviour. This uses udelay(). Using a delay of 2uS, it loops polling
the completion bit an average of 10 times. It takes 29uS to exist the
loop. This suggests that udelay(2) is pretty accurate.
This seems like a reasonable compromise. The bus load has been reduced
from 62 to 10 poll loops, without increasing the time it takes to exit
the loop by much. And 10 polls allows for significantly faster
completions when using a faster bus clock and preamble suppression.
So i plan to resubmit using readl_poll_timeout_atomic().
Andrew
Powered by blists - more mailing lists