netdev - Re: [PATCH v3 00/10] split out emac cpdma and mdio for reuse

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sat, 11 Sep 2010 09:14:14 -0400
From:	Michael Williamson <michael.williamson@...ticallink.com>
To:	cyril@...com
CC:	Kevin Hilman <khilman@...prootsystems.com>,
	"tony@...mide.com" <tony@...mide.com>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	"davinci-linux-open-source@...ux.davincidsp.com" 
	<davinci-linux-open-source@...ux.davincidsp.com>,
	"linux-omap@...r.kernel.org" <linux-omap@...r.kernel.org>,
	"davem@...emloft.net" <davem@...emloft.net>
Subject: Re: [PATCH v3 00/10] split out emac cpdma and mdio for reuse

Hi Cyril,

On 09/10/2010 06:59 PM, Cyril Chemparathy wrote:
> Hi Mike,
> 
> I have merged your latest two emails and responded to both here.
> 
[...]
> One of the patches posted on my repo [1] replaces the dumb mdelay() with
> a bit more logic that calculates the worst-case access time.  This
> mechanism may work a lot better for you.  Would you mind trying it out?
> 

You patch from [1] is working much more reliably now (well, 6 for 6 boot cycles as well as
several ifup/ifdown cycles).  I do get the "resetting idled controlled" console message
every cycle, it seems that will be expected now.

[...]

>>
>> Also, your while(1) loops with the continue conditions on the second wait_for_user_access() 
>> in the read and writes might need some consideration, i.e.:
>>
>>         while (1) {
>>                 ret = wait_for_user_access(data);
>>                 if (ret == -EAGAIN)
>>                         continue;
>>                 if (ret < 0)
>>                         break;
>>
>>                 __raw_writel(reg, &data->regs->user[0].access);
>>
>>                 ret = wait_for_user_access(data);
>>                 if (ret == -EAGAIN)
>>                         continue;
>>                         ^^^^^^^^^ <--- this will re-issue the request.... what you want?
> 
> Yes, the wait_for_user_access() would have reset the controller,
> throwing the current transaction out.  The intent here is to restart the
> current transaction with a newly initialized controller.
> 

OK.  Makes sense.  Looking at it felt like there was a chance for end endless spin, but that
seems unlikely given how that condition might fire.

[...]

>> Also, on the shutdown, I get a major kernel trace.  Here is the dump, as much 
>> as I could catch of it.... (I need a better terminal program)
> 
> [...]
>> WARNING: at drivers/net/davinci_emac.c:1025 __cpdma_chan_free+0xac/0xb0()
> 
> The current code spits out a huge volume of stuff as a result of a
> WARN_ON in the rx handler.  The gitweb on [1] has a patch that fixes this.
> 

Yes,  these messages are no longer issued with the patches from [1].  Thanks.

> [...]
>>>> The MDIO module upgrade (rev 1.4 -> 1.5) could have something to do with
>>>> this behavior.  Even so, I can't explain why this issue wasn't seen on
>>>> da8xx prior to this series.  The original code should (at least in
>>>> theory) have sporadically locked up on emac open.
>>>>
>> I think, if I understand it correctly, that in the previous version of 
>> this code, the emac was reset *prior* to enabling, scanning, and assigning 
>> the associated phy on the MDIO bus.  The new implementation sets up and scans 
>> the MDIO bus first, then comes back around to the EMAC second... hits a reset,
>> and doesn't re-ENABLE the MDIO.
> 
> AFAICS, that isn't entirely accurate.  In the previous version, the mdio
> bus was being brought up at probe time in davinci_emac_probe().  The
> soft-reset was happening later on when the device is opened, in
> emac_hw_enable().
> 
> The difference, however, is that the original code forced an
> emac_mii_reset() immediately after the emac_hw_enable().  This is not
> being done with the separated mdio, and that is the problem.  In terms
> of behavior, with the current work around, the new and old versions
> should be close to identical.  More below...
> 
>> Also, maybe hitting the EMAC reset while the MDIO state machine is up is *bad*, I 
>> seem to recall some text in the user's guide about waiting for the state
>> machine to stop before disabling it.  I wonder if that also applies to reset?
> 
> You are correct.  EMAC soft-reset stops the MDIO mid-transaction, quite
> unlike disabling the module via the control register.  Therefore, there
> is a risk that a badly designed phy could be left hanging in an
> arbitrary state.  However, all earlier versions of the emac code have
> been exposed to this very same vulnerability (i.e. arbitrary emac
> soft-reset regardless of mdio state) all along.
> 

Thanks for straightening me out on this, Cryil.  Your patch series in [1] seems to
have resolved the issues I've been able to see on the da850 based board I'm using
here.  I appreciate your patience and quick response.  I may try to beat on it a 
bit more with some network performance tests (even though it's not at all related 
to the immediate problems you've fixed) -- we've had that on our list of todos 
anyway for this module.

I think it would be good to float your patches over to davinci-next, if possible, before the
37 merge window opens...  Speaking, of course, from a very partial perspective.

> [1]
> http://arago-project.org/git/people/?p=cyril/linux-tnetv107x.git;a=shortlog;h=refs/heads/emac-cpdma-mdio-fixes

-Mike
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html