netdev - Re: 2.6.29-rc3: tg3 dead after resume

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LFD.2.00.0901301506120.3150@localhost.localdomain>
Date:	Fri, 30 Jan 2009 15:06:30 -0800 (PST)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Parag Warudkar <parag.lkml@...il.com>
cc:	Matt Carlson <mcarlson@...adcom.com>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	"David S. Miller" <davem@...emloft.net>,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: 2.6.29-rc3: tg3 dead after resume

On Fri, 30 Jan 2009, Parag Warudkar wrote:
> 
> [  245.924484] eth0: PCI_COMMAND reg = 0x406 (bit 1 is on)
> [  245.924487] eth0: Reg value at offset 0x0 is 0xffffffff
> [  247.317971] tg3: eth0: No firmware running.
> [  258.710634] ADDRCONF(NETDEV_UP): eth0: link is not ready
> ^^^ Post-Suspend
> 
> So it looks like the memory space IO is enabled before and after suspend.
> The device/vendor id goes 0xffffffff after resume - just like before.
> Does that one matter? (Firmware may be looking at it?) 

One thing strikes me - are there any bridges between the host (CPU) and 
that tg3 device?

Because we obviously have two people who say that their tg3 suspend/resume 
works fine, so the tg3 driver is obviously not _totally_ broken. So I'm 
wondering if there is something funny in between the CPU and the tg3, like 
a hotplug bridge that needs magic to wake up properly.

Because clearly the PCI config space addresses are working fine, but the 
thing is, while PCI config space accesses are routed by the device number 
(and the bridges notion of secondary bridging), the PCI memory space 
routing is based on address. So a PCI bridge can easily get one right (in 
fact, it's really hard to get config space accesses wrong without the 
bridges being _totally_ screwed up), while not routing the other at all.

So just do that "lspci -vvxxx" for the whole box, before and after, and 
send us the "before" and the "diff -u before after" thing, and maybe that 
shows something interesting. Because some bridge chip being confused would 
also explain why a total re-init of the whole tg3 chip by a driver unload 
and reload doesn't seem to help.

		Linus
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html