linux-kernel - Re: Regression from 2.6.26: Hibernation (possibly suspend) broken on Toshiba R500 (bisected)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <200812042340.58694.rjw@sisk.pl>
Date:	Thu, 4 Dec 2008 23:40:58 +0100
From:	"Rafael J. Wysocki" <rjw@...k.pl>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Frans Pop <elendil@...net.nl>, Greg KH <greg@...ah.com>,
	Ingo Molnar <mingo@...e.hu>, jbarnes@...tuousgeek.org,
	lenb@...nel.org,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	tiwai@...e.de, Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: Regression from 2.6.26: Hibernation (possibly suspend) broken on Toshiba R500 (bisected)

On Thursday, 4 of December 2008, Linus Torvalds wrote:
> 
> On Thu, 4 Dec 2008, Frans Pop wrote:
> 
> > On Wednesday 03 December 2008, Linus Torvalds wrote:
> > > Well, I think that what _would_ be generally correct, and actually
> > > pretty simple, is a rather different approach: just not sizing things
> > > behind a transparent bridge AT ALL, since it really shouldn't matter.
> > 
> > I've given your patch a try and the few resumes from STR I've done were 
> > all successful. That's not 100% conclusive yet, but a nice start.
> > Some info from logs etc. below.
> 
> Ok, but I thought you had a hard time reproducing this _anyway_, even with 
> just plain -rc7. No?
> 
> That said, of the various patches posted, the "don't bother allocating 
> bridging windows for transparent bridges" one is not just the simplest, 
> but the only one that actually makes sense so far.
> 
> So I'm happy it's apparently working for you, I'm just wondering about 
> whather your success means a lot. It seems that Rafael is the one who had 
> more failures?

This most probably is correct and I got a resume failure with that patch
applied, so it evidently doesn't fix the problem. :-(

> > > > Also, I would be happy to actually understand _why_ this happens.
> > >
> > > 100% agreed. I do _not_ see why it should ever matter how we set up a
> > > PCI bridging window - whether prefetchable or not - on a bridge that
> > > should be transparent. It sounds really odd. I'm wondering if there is
> > > something we're missing here.
> > 
> > The theory that it is really a resume issue and not a device layout issue 
> > sounds logical. Especially as everything always works correctly after a 
> > normal boot.
> 
> Yes, that does sound like a convincing argument. Usually real PCI resource 
> clashes result in some kind of run-time problems, and wouldn't necessarily 
> be suspend-specific per se.
> 
> That said, suspend/resume does a lot of unusual things, so it could still 
> be some odd PCI resource clash that only triggers problems in the 
> suspend/resume case. But since the exact layouts and the sizing of the 
> resources doesn't really seem to matter, a simple PCI resource clash seems 
> rather unlikely.

I agree.

That said the "don't bother allocating bridging windows for transparent
bridges" patch resulted in the following layout on my box (from /proc/iomem):

88000000-8807ffff : 0000:00:02.1
88080000-88083fff : 0000:00:1b.0
  88080000-88083fff : ICH HD audio
88084000-88087fff : 0000:03:0b.1
88088000-88088fff : 0000:03:0b.0
  88088000-88088fff : yenta_socket
88089000-880897ff : 0000:03:0b.1
  88089000-880897ff : firewire_ohci
88089800-880898ff : 0000:03:0b.3
  88089800-880898ff : mmc0
8808a000-8808afff : Intel Flush Page
8c000000-8fffffff : PCI CardBus 0000:04
90000000-93ffffff : PCI CardBus 0000:04

while my "don't allocate bridging windows for cardbus bridges behind
transparent bridges" patch I've just sent (appended for easier reference)
results in the layout:

88000000-880fffff : PCI Bus 0000:03
  88000000-88003fff : 0000:03:0b.1
  88004000-88004fff : 0000:03:0b.0
    88004000-88004fff : yenta_socket
  88005000-880057ff : 0000:03:0b.1
    88005000-880057ff : firewire_ohci
  88005800-880058ff : 0000:03:0b.3
    88005800-880058ff : mmc0
88100000-8817ffff : 0000:00:02.1
88180000-88183fff : 0000:00:1b.0
  88180000-88183fff : ICH HD audio
88184000-88184fff : Intel Flush Page
8c000000-8fffffff : PCI CardBus 0000:04
90000000-93ffffff : PCI CardBus 0000:04

where devices behind the transparent bridge (PCI Bus 0000:03) are located
_before_ ICH HD audio in the memory address space, and this one appears to
work.  So there _may_ be an effect of the layout too.

> So some kind of resume-time ordering or timing issue does seem like the 
> most likely thing. But that still leaves us not knowing what the real 
> _root_ cause of this all is - very irritating. Even if not allocating the 
> unnecessary bridging windows "fixes" things, it would be really really 
> good to know exactly what it is that causes problems.

Well, given that both affected boxes have the same chipset (945GM), I seriously
suspect a nastiness in that chipset we're not aware of.  Especially that
the problem is not reproducible without snd_hda_intel (at least on my box).

> > Below info from 3 kernels, all based on 2.6.28-rc7-91:
> > A) unpatched
> > B) with the revert/debug patch
> > C) with the oneliner "ignore transparent bridges" patch
> > 
> > AFAICT all results are probably as expected.
> > 
> > From lspci -vvxxx:
> > 00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge
> > - for A)
> > 	I/O behind bridge: 00003000-00003fff
> > 	Memory behind bridge: e0100000-e03fffff
> > 	Prefetchable memory behind bridge: 0000000080000000-0000000083ffffff
> > - for B)
> > 	I/O behind bridge: 00003000-00003fff
> > 	Memory behind bridge: e0100000-e03fffff
> > - for C)
> > 	Memory behind bridge: e0100000-e03fffff
> 
> And this all makes total sense. The e0100000-e03fffff MMIO bridge range is 
> apparently set up by the firmware, which is why it shows up in all cases. 
> And the (A) case has that prefetchable memory range, because that's the 
> only case that finds - and cares about - the prefetch window for the 
> CardBus controller. 
> 
> And both (A) and (B) have the IO bridging window, because regardless of 
> whether we see a valid CardBus prefetchable memory window with good 
> alignment, we'll always see the IO ports, so we'll try to allocate that 
> bridging window, except in (C) when we decide that due to the transparent 
> nature, we simply don't care.
> 
> So the PCI resources make sense in all three cases, and we understand 
> those. The differences in the actual Cardbus ranges also all make sense. 
> So it all still boils down to the PCI layer doing everything right in 
> _all_ cases, just making slightly different - but all valid - choices 
> depending on essentially random details (eg the revert/debug patch case 
> the "random detail" is just enabling a small incorrect alignment).
> 
> IOW, it really doesn't look like a PCI resource allocator bug. Quite the 
> reverse, I'd say that in the end this whole thread points out just how 
> robust the whole PCI and cardbus resource allocation is, with the code 
> really very gracefully just adjusting in a sane manner to all these 
> different cases.
> 
> Of course, none of that helps us with any kind of idea of what the real 
> problem is. Device ordering bug in setting up PCI resources at resume? 
> Perhaps just a plain bug in PCI bridge resume code (even when you resume 
> things in the right order)?
> 
> And I still worry that perhaps it's just a timing bug, where having a PCI 
> bridging window changes timing of various PCI accesses, and the _real_ bug 
> is actually in the sound card or ethernet driver resume, which happens to 
> work with one timing and not with another.
> 
> Since it's apparently STR, has anybody gotten _anything_ sane out of 
> trying to enable PM_TRACE_RTC, and then doing that 
> 
> 	echo 1 > /sys/power/pm_trace
> 
> because even with the (very limited) set of standard trace-points, it 
> should still be able to tell which device we were trying to resume last in 
> the failure case Maybe that gives some hint?

Well, I think more fine-grained debugging will be necessary.

I've already checked the resume ordering of PCI devices on my box and it
is the following:

pci:0000:00:00.0
pci:0000:00:02.0 <- graphics
pci:0000:00:02.1 <- graphics
pci:0000:00:1b.0 <- snd_hda_intel
pci:0000:00:1c.0 <- PCI Express port 1
pci:0000:00:1c.2 <- PCI Express port 3
pci:0000:00:1d.0 <- USB UHCI
pci:0000:00:1d.1 <- USB UHCI
pci:0000:00:1d.2 <- USB UHCI
pci:0000:00:1d.3 <- USB UHCI
pci:0000:00:1d.7 <- USB EHCI
pci:0000:00:1e.0 <- transparent bridge (Intel Corporation 82801 Mobile PCI Bridge)
pci:0000:00:1f.0 <- ISA bridge
pci:0000:00:1f.2 <- SATA (ahci)
pci:0000:01:00.0 <- e1000e
No Bus:0000:01
pci:0000:02:00.0 <- wireless (iwlagn)
No Bus:0000:02
pci:0000:03:0b.0 <- cardbus bridge
pci:0000:03:0b.1 <- FireWire
pci:0000:03:0b.3 <- SD Host controller (Texas Instruments)
No Bus:0000:04
No Bus:0000:03

So, snd_hda_intel resumes before all of the bridges and the layout of devices
_behind_ the transparent bridge shouldn't affect it at all.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/