lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 30 Sep 2009 22:33:31 +0200
From:	Bernhard Kaindl <bernhard.kaindl@....net>
To:	"David S. Miller" <davem@...emloft.net>,
	Bruce Allan <bruce.w.allan@...el.com>
CC:	Jeff Kirsher <jeffrey.t.kirsher@...el.com>, netdev@...r.kernel.org
Subject: [PATCH] Regression: e100_phy_init() isolates even selected PHY, causes
 10 seconds boot delay

Dear David, Dear Bruce,

The current e100.c:e100_phy_init() electrically isolates all
the PHYs (even the selected PHY -- for a short time!) from the MII.

This happens only for a short duration before the isolation
of the selected PHY is reverted, but it's enough to cause a
major disturbance in the startup of our e100-based cards:

On a number of Embedded/Industry Pentium boards which are in use,
the result is that the initial DHCP negotiation takes more
than 10 seconds to complete with 2.6.30 and .31, while it's
done in a fraction of a second with 2.6.29 and earlier
(kernels tested with no delay range from 2.6.23 to 2.6.29)

That regression was introduced on March 31 in the by a patch
from Bruce which first appeared in 2.6.30-rc3:

http://marc.info/?l=linux-netdev&m=123766715429780&w=2
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=b55de80e49892002a1878013ab9aee1a30970be6

> From: Bruce Allan <bruce.w.allan@...el.com>
>
> This patch enables support for the new Intel 82552 adapter (new PHY paired
> with the existing MAC in the ICH7 chipset).  No new features are added to
> the driver, however there are minor changes due to updated registers and a
> few workarounds for hardware errata.

In the middle, the patch has two changes for e100_phy_init()

The first one looks like a code optimization and does not appear to matching
the criteria described by Bruce in his submission, so assume it made it into
the patch submission by accident:

> @@ -1276,16 +1294,12 @@ static int e100_phy_init(struct nic *nic)
> 	if (addr == 32)
> 		return -EAGAIN;
>
> -	/* Selected the phy and isolate the rest */
> -	for (addr = 0; addr < 32; addr++) {
> -		if (addr != nic->mii.phy_id) {
> -			mdio_write(netdev, addr, MII_BMCR, BMCR_ISOLATE);
> -		} else {
> -			bmcr = mdio_read(netdev, addr, MII_BMCR);
> -			mdio_write(netdev, addr, MII_BMCR,
> -				bmcr & ~BMCR_ISOLATE);
> -		}
> -	}
> +	/* Isolate all the PHY ids */
> +	for (addr = 0; addr < 32; addr++)
> +		mdio_write(netdev, addr, MII_BMCR, BMCR_ISOLATE);
> +	/* Select the discovered PHY */
> +	bmcr &= ~BMCR_ISOLATE;
> +	mdio_write(netdev, nic->mii.phy_id, MII_BMCR, bmcr);
>
> 	/* Get phy ID */
> 	id_lo = mdio_read(netdev, nic->mii.phy_id, MII_PHYSID1);

If this was meant as a workaround for a hardware errata, it should
have been mentioned to ensure that no one undoes the change without
knowing.

Anyway:

What this does is that it removes the 2.6.23-2.6.29 PHY isolation loop
which ensured that *ONLY* PHY addresses which *do not* match the selected
PHY address are electrically isolated:

> -	/* Selected the phy and isolate the rest */
> -	for (addr = 0; addr < 32; addr++) {
> -             if (addr != nic->mii.phy_id) {
> -                     mdio_write(netdev, addr, MII_BMCR, BMCR_ISOLATE);

The 2.6.29 loop cleared the isolate bit of the discovered PHY in the
else clause of this if statement:

> -		} else {
> -			bmcr = mdio_read(netdev, addr, MII_BMCR);
> -			mdio_write(netdev, addr, MII_BMCR,
> -				bmcr & ~BMCR_ISOLATE);
> -		}

This loop is then replaced with electrical isolation of *_ALL_* PHYs:

> +	/* Isolate all the PHY ids */
> +	for (addr = 0; addr < 32; addr++)
> +		mdio_write(netdev, addr, MII_BMCR, BMCR_ISOLATE);

Which is reverted for the discovered PHY afterwards:

> +	/* Select the discovered PHY */
> +	bmcr &= ~BMCR_ISOLATE;
> +	mdio_write(netdev, nic->mii.phy_id, MII_BMCR, bmcr);

This change resulted in a delay of the "Link Up" message from the
e100 watchdog routine and in a number of DHCP packages getting lost
for a duration of about five seconds.

I suppose that this may have powered-down our PHYs for a short moment
or at least disturbed the connection which it has with the MII and/or
the outside world.

In any case:

Reverting solely this change alone fixed the 10 second boot delay!

For more information, I attached two driver debug logs of the commands
between firmware load and "Link Up" with and without isolation of the
selected PHY.

----------------------------------------------------------------------

But indeed, there is potential for a possibly valid optimization:

The old e100_phy_init() was reading the "bmcr" variable twice.
First, when probing the PHYs are probed:

/* Discover phy addr by searching addrs in order {1,0,2,..., 31} */
for (addr = 0; addr < 32; addr++) {
	nic->mii.phy_id = (addr == 0) ? 1 : (addr == 1) ? 0 : addr;
	bmcr = mdio_read(netdev, nic->mii.phy_id, MII_BMCR);
	stat = mdio_read(netdev, nic->mii.phy_id, MII_BMSR);
	stat = mdio_read(netdev, nic->mii.phy_id, MII_BMSR);
	if (!((bmcr == 0xFFFF) || ((stat == 0) && (bmcr == 0))))
		break;
-> When the selected PHY is found, the loop aborts here, so bmcr
     has the bmcr of the selected PHY
}

and second, when the old PHY setup loop was clearing the isolate bit:

> -		} else {
> -			bmcr = mdio_read(netdev, addr, MII_BMCR);
> -			mdio_write(netdev, addr, MII_BMCR,
> -				bmcr & ~BMCR_ISOLATE);
> -		}

Reading the MII_BMCR value twice was therefore not strictly necessary,
so I think this was the optimization which Bruce had in mind, and his
other motivation may have been to simplify the PHY setup loop.

So, what should be done is to only isolate the other PHYs and just clear
the isolate bit of the selected PHY, in the simplest possible way.

The patch which we are now using with 2.6.31 to fix this regression
does the following:

* Remove the loop which isolates *ALL* PHYs and then clears of
   the isolate bit of the selected PHY.

* Keep the new code which just clears the isolate bit of the BMCR
   of the discovered PHY

* Afterwards, isolate the unused PHYs.

This essentially resembles what the 2.6.23-2.6.29 code did, only
clearing the isolate bit of the discovered PHY is explicitly done
first, while before, it was in the middle of the PHY isolation loop.

A plain revert is probably safest in case of real paranoia here.
Electrically, this should be the same what we had until 2.6.30-rc3:

Tested by me and my colleagues:

--- linux-2.6.31/drivers/net/e100.c
+++ linux-2.6.31/drivers/net/e100.c
@@ -1294,13 +1294,15 @@ static int e100_phy_init(struct nic *nic
   	if (addr == 32)
   		return -EAGAIN;

-	/* Isolate all the PHY ids */
-	for (addr = 0; addr < 32; addr++)
-		mdio_write(netdev, addr, MII_BMCR, BMCR_ISOLATE);
   	/* Select the discovered PHY */
   	bmcr &= ~BMCR_ISOLATE;
   	mdio_write(netdev, nic->mii.phy_id, MII_BMCR, bmcr);

+	/* Electrically isolate only the unused PHYs */
+	for (addr = 0; addr < 32; addr++)
+		if (addr != nic->mii.phy_id)
+			mdio_write(netdev, addr, MII_BMCR, BMCR_ISOLATE);
+
   	/* Get phy ID */
   	id_lo = mdio_read(netdev, nic->mii.phy_id, MII_PHYSID1);
   	id_hi = mdio_read(netdev, nic->mii.phy_id, MII_PHYSID2);

Signed-off-by: Bernhard Kaindl <bernhard.kaindl@....net>

Best Regards,
Bernhard Kaindl

View attachment "e100_phy_init_without_isolation.txt" of type "text/plain" (1135 bytes)

View attachment "e100_phy_init_with_isolation.txt" of type "text/plain" (3423 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ