lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20200419204722.GQ836632@lunn.ch>
Date:   Sun, 19 Apr 2020 22:47:22 +0200
From:   Andrew Lunn <andrew@...n.ch>
To:     Chris Healy <cphealy@...il.com>
Cc:     Andy Duan <fugang.duan@....com>,
        David Miller <davem@...emloft.net>,
        netdev <netdev@...r.kernel.org>,
        Florian Fainelli <f.fainelli@...il.com>,
        Heiner Kallweit <hkallweit1@...il.com>
Subject: Re: [EXT] [PATCH net-next v2 1/3] net: ethernet: fec: Replace
 interrupt driven MDIO with polled IO

On Sat, Apr 18, 2020 at 03:39:17PM -0700, Chris Healy wrote:
> I did some profiling using an oscilloscope with my NXP Vybrid based
> platform to see what different "sleep_us" values resulted in for start
> of MDIO to start of MDIO transaction times.  Here's what I found:
> 
> 0  - ~38us to ~40us
> 1  - ~48us to ~64us
> 2  - ~48us to ~64us
> 3  - ~48us to ~64us
> 4  - ~48us to ~64us
> 4  - ~48us to ~64us
> 5  - ~48us to ~64us
> 6  - ~48us to ~64us
> 7  - ~48us to ~64us
> 8  - ~48us to ~64us
> 9  - ~56us to ~88us
> 10 - ~56us to ~112us
> 
> Basically, with the "sleep_us" value set to 0, I would get the
> shortest inter transaction times with a very low variance.  Once I
> went to a non-zero value, the inter transaction time went up, as well
> as the variance, which I suppose makes sense....

Hi All

I dug into this, adding some instrumentation. I've been testing on a
Vybrid platform. During boot it does over 6500 transactions in order
to configure 3 switches and one PHY.

With a delay of 0, it polls an average of 62 times, and it needs 29us
to exit the loop. This means one poll takes 0.5uS.

With a delay of 1uS, it polls on average 2 times, and takes on average
45uS to exit the loop. Which suggests the delay takes around 22uS,
despite the request for only 1uS. Looking at the code, it is actually
performing a usleep_range(1, 1). So i'm assuming sleeping is very
expensive, maybe it is using a timer? So we end up with just as much
interrupt work as waiting for the MDIO interrupt?

By swapping to readl_poll_timeout_atomic() i got much better
behaviour. This uses udelay(). Using a delay of 2uS, it loops polling
the completion bit an average of 10 times. It takes 29uS to exist the
loop. This suggests that udelay(2) is pretty accurate.

This seems like a reasonable compromise. The bus load has been reduced
from 62 to 10 poll loops, without increasing the time it takes to exit
the loop by much. And 10 polls allows for significantly faster
completions when using a faster bus clock and preamble suppression.

So i plan to resubmit using readl_poll_timeout_atomic().

   Andrew

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ