netdev - Re: 2.6.29-rc3: tg3 dead after resume

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <200901310059.17746.rjw@sisk.pl>
Date:	Sat, 31 Jan 2009 00:59:16 +0100
From:	"Rafael J. Wysocki" <rjw@...k.pl>
To:	Parag Warudkar <parag.lkml@...il.com>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Matt Carlson <mcarlson@...adcom.com>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	"David S. Miller" <davem@...emloft.net>,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: 2.6.29-rc3: tg3 dead after resume

On Saturday 31 January 2009, Parag Warudkar wrote:
> 
> On Fri, 30 Jan 2009, Linus Torvalds wrote:
> 
> > 
> > Because we obviously have two people who say that their tg3 suspend/resume 
> > works fine, so the tg3 driver is obviously not _totally_ broken. So I'm 
> > wondering if there is something funny in between the CPU and the tg3, like 
> > a hotplug bridge that needs magic to wake up properly.
> > 
> > Because clearly the PCI config space addresses are working fine, but the 
> > thing is, while PCI config space accesses are routed by the device number 
> > (and the bridges notion of secondary bridging), the PCI memory space 
> > routing is based on address. So a PCI bridge can easily get one right (in 
> > fact, it's really hard to get config space accesses wrong without the 
> > bridges being _totally_ screwed up), while not routing the other at all.
> > 
> > So just do that "lspci -vvxxx" for the whole box, before and after, and 
> > send us the "before" and the "diff -u before after" thing, and maybe that 
> > shows something interesting. Because some bridge chip being confused would 
> > also explain why a total re-init of the whole tg3 chip by a driver unload 
> > and reload doesn't seem to help.
> 
> Totally worth having this problem from a "getting an opportunity to 
> understand" standpoint. This confirms my long standing suspicion that bugs 
> in Linux kernel are merely a handiwork of few clever people to get more 
> people to understand and contribute :)
> 
> Any how  here is the pre-suspend lspci -vvxxx output followed by diff -u -
> 

[--snip--]

I think this is what we're looking for:

> @@ -472,7 +472,7 @@
>  f0: 00 00 00 00 00 00 00 00 80 0f 01 00 00 00 00 00
>  
>  00:1c.0 PCI bridge: Intel Corporation 631xESB/632xESB/3100 Chipset PCI Express Root Port 1 (rev 09)
> -	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
> +	Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
>  	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>  	Latency: 0, Cache Line Size: 64 bytes
>  	Bus: primary=00, secondary=0e, subordinate=0e, sec-latency=0

and the PCIe port driver may be at fault.

Can you try to remove the pci_save_state(dev) from pcie_port_suspend_late()
and see if that helps?

Rafael
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html