lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.2.00.0901281636270.3123@localhost.localdomain>
Date:	Wed, 28 Jan 2009 17:09:14 -0800 (PST)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Parag Warudkar <parag.lkml@...il.com>
cc:	netdev@...r.kernel.org,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	"David S. Miller" <davem@...emloft.net>,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: 2.6.29-rc3: tg3 dead after resume



On Wed, 28 Jan 2009, Parag Warudkar wrote:
>
> This is similar to the issue reported back in Jul 2007 - 
> http://kerneltrap.org/mailarchive/linux-kernel/2007/8/1/154073/thread 
> which was fixed with a patch to unconditionally save/restore pci config 
> space - that one is still in tg3.c.

In fact, the new PCI suspend/restore code should have made that 
unnecessary, since the PCI layer now makes sure that a save/restore is 
done even if the driver hadn't done it.

But at the same time, still having the driver do it certainly shouldn't 
have _hurt_ anything either. But it's quite possible that the tg3 thing is 
very sensitive to the exact order things happen in - there's a lot of 
comments about bugs in there ;)

> After resume tg3 complains that no firmware is running and eth0 is 
> non-existent. Rmmoding and modprobing tg3 again causes some timeouts and 
> errors from tg3 and the link still doesn't work.

That seems to imply that even the reset failed, which is interesting. 

But it also possibly means that the problem is not necessarily the driver 
itself, but some cached state that we keep around in "struct pci_dev" even 
across a module load/unload. 

For example, if we get the "dev->current_state" cache wrong, then we may 
not actually end up changing it when we should, because we think we 
already match the target state. I don't _think_ that is it, but that's the 
kind of thing that could happen.

Can you do a

	lspci -vvxxx -s [tg3-device]

before-and-after suspend? Is there some state that looks like it got 
corrupted?

			Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ