[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <200808302120.10309.rjw@sisk.pl>
Date: Sat, 30 Aug 2008 21:20:09 +0200
From: "Rafael J. Wysocki" <rjw@...k.pl>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Jeff Garzik <jeff@...zik.org>, Tejun Heo <htejun@...il.com>,
Ingo Molnar <mingo@...e.hu>,
Yinghai Lu <yhlu.kernel@...il.com>,
David Witbrodt <dawitbro@...global.net>,
Andrew Morton <akpm@...ux-foundation.org>,
Kernel Testers <kernel-testers@...r.kernel.org>
Subject: Re: Linux 2.6.27-rc5: System boot regression caused by commit a2bd7274b47124d2fc4dfdb8c0591f545ba749dd
On Saturday, 30 of August 2008, Linus Torvalds wrote:
>
> On Sat, 30 Aug 2008, Rafael J. Wysocki wrote:
> >
> > > And if you have the whole dmesg, that would be useful.
> >
> > dmesg from -rc5 with the offending commit reverted and with the patch
> > below applied is at:
> >
> > http://www.sisk.pl/kernel/debug/mainline/2.6.27-rc5/2.6.27-rc5-git.log
>
> Ok, the more I look at this, the more interesting it gets.
>
> In particular, this:
>
> ...
> ACPI: bus type pnp registered
> pnp 00:08: mem resource (0xfec00000-0xfec00fff) overlaps 0000:00:00.0 BAR 3 (0xe0000000-0xffffffff), disabling
> pnp 00:08: mem resource (0xfee00000-0xfee00fff) overlaps 0000:00:00.0 BAR 3 (0xe0000000-0xffffffff), disabling
> pnp 00:09: mem resource (0xffb80000-0xffbfffff) overlaps 0000:00:00.0 BAR 3 (0xe0000000-0xffffffff), disabling
> pnp 00:09: mem resource (0xfff00000-0xffffffff) overlaps 0000:00:00.0 BAR 3 (0xe0000000-0xffffffff), disabling
> pnp 00:0b: mem resource (0xe0000000-0xefffffff) overlaps 0000:00:00.0 BAR 3 (0xe0000000-0xffffffff), disabling
> pnp 00:0c: mem resource (0xfec00000-0xffffffff) overlaps 0000:00:00.0 BAR 3 (0xe0000000-0xffffffff), disabling
> pnp: PnP ACPI: found 13 devices
> ACPI: ACPI bus type pnp unregistered
> SCSI subsystem initialized
> libata version 3.00 loaded.
> usbcore: registered new interface driver usbfs
> usbcore: registered new interface driver hub
> usbcore: registered new device driver usb
> PCI: Using ACPI for IRQ routing
> pci 0000:00:00.0: BAR 3: can't allocate resource
> ...
>
> there's a few things to note here:
>
> - the resource at 0000:00:00.0 BAR 3 is totally bogus.
>
> We know it's totally bogus because you actually have other resources in
> the 0xf....... range, and they work fine. It's also likely to be
> totally bogus because it so happens that the end-point of 0xffffffff is
> commonly something that the BIOS leaves as a "I sized this resource",
> because that's how resources are sized (you write all ones into them
> and look what you can read back).
>
> But your lspci -vxx output clearly shows that (a) MEM is enabled in
> the command word, and yes, the BAR register at 0x18 does indeed have
> value 0xe0000000. So it's just the length that is really bogus.
>
> - pnp clearly sees that bogus resource at 0xe0000000-0xffffffff
>
> - BUT: the "can't allocate resource" thing is from
> pcibios_allocate_resources(), and means that the request_resource()
> failed _despite_ the fact that you hadn't reserved the e820 resources
> yet with the new patch.
>
> The thing that seems to save you is that we've already allocated something
> in that region. There's a few things there, like:
>
> fee00000-fee00fff : Local APIC
>
> but that particular one is actually reserved much later, so that doesn't
> explain it. I think that what happens is that we have allocated the _bus_
> resources earlier in "pcibios_allocate_bus_resources()", and that means
> that we already have these resources:
>
> fe700000-fe7fffff : PCI Bus 0000:01
> fe800000-fe8fffff : PCI Bus 0000:02
> fe900000-fe9fffff : PCI Bus 0000:03
> fea00000-feafffff : PCI Bus 0000:04
> feb00000-febfffff : PCI Bus 0000:05
>
> in the resource tree, and that in turn means that when we try to allocate
> the bogus MCFG resource, it fails.
>
> Which is good - it mustn't succeed.
>
> What _broke_ for you is that the horrible patch that got reverted said
> that "if we recognize this as an MCFG resource, we will _always_ try to
> insert it", so it fundamentally broke the whole resource tree, because it
> force-inserted that totally crap resource.
Well, I thought something like this happened, but I wasn't quite sure about the
exact mechanism. Thanks for the explanation. :-)
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists