[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.1.10.0808292216310.3290@nehalem.linux-foundation.org>
Date: Fri, 29 Aug 2008 22:52:40 -0700 (PDT)
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Yinghai Lu <yhlu.kernel@...il.com>
cc: "Rafael J. Wysocki" <rjw@...k.pl>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Jeff Garzik <jeff@...zik.org>, Tejun Heo <htejun@...il.com>,
Ingo Molnar <mingo@...e.hu>,
David Witbrodt <dawitbro@...global.net>,
Andrew Morton <akpm@...ux-foundation.org>,
Kernel Testers <kernel-testers@...r.kernel.org>
Subject: Re: Linux 2.6.27-rc5: System boot regression caused by commit
a2bd7274b47124d2fc4dfdb8c0591f545ba749dd
On Fri, 29 Aug 2008, Yinghai Lu wrote:
>
> if we don't add the IORESOURCE_BUSY, why bother to add these entries...
You don't understand how the resource allocator works.
IORESOURCE_BUSY is really more of a "legacy bit". It has almost no bearing
on the actual allocations.
Just grep for IORSOURCE_BUSY in kernel/resource.c. The _only_ thing that
cares about busy/non-busy is the legact "request_region()" function. That
one isn't actually used by any core PCI code - it's more of a driver
issue to claim exclusive ownership of particular resources by inserting a
marker in that resource.
So IORESOURCE_BUSY is a red herring. The only reason I said you can clear
it is because you claimed it causes problems, but the more I look at it,
the more I think you're likely just mistaken - because IORESOURCE_BUSY
doesn't make any difference at all to normal resource handling until you
get to actual drivers.
The bigger issue is that just inserting the resource (and it really
doesn't matter if it is marked busy or not) is in itself a mark of
"there's something here". THAT is what all the resource code cares about.
The IORESOURCE_BUSY bit is almost immaterial (ie _is_ immaterial except
for some very specific cases).
And the reason we need to add the e820 resources is exactly so that we
don't try to allocate PCI resources on top of some system resources we
don't even know about!
> good layout from BIOS, it should only reserve mmio range is not showing in BAR.
I agree, but "good layour" and "BIOS" don't really go together. There's
too many broken BIOSes.
> if one stupid BIOS set
> 0xdc000000 - 0x100000000 for reserved.
>
> then when in insert that range late
Sure, but really, the only point of even caring about e820 resources in
the first place has really nothing to do with the BAR's we can see
(because the kernel can handle _those_ perfectly well on its own), and has
everything to do with teh fact that a lot of devices have invisible
resources that we _cannot_ see (ie magic non-standard BAR's for the
motherboard chips).
And those are exactly why we want to populate the resource map with the
e820 information - to avoid having dynamic resources (like Cardbus or PCI
hotplug, or just devices that weren't set up statically by the BIOS) be
then allocated by the kernel on top of those "invisible" resources.
And the dynamic code actually doesn't care about IORESOURCE_BUSY at all:
it will avoid _any_ resource it can see. Think about it: it has to - since
existing PCI resources we have set up will _not_ have that IORESOURCE_BUSY
set.
In many ways, IORESOURCE_BUSY is pure legacy stuff, and is meant for "this
is a black hole and you must not look into it at all". It originates with
a need to originally having to lock drivers away from other drives by
marking their resources busy - in an ISA world, where there are no other
ways of saying "I own this device".
(Yeah, yeah, PCI drivers do the same thing too - they mark their BAR's by
inserting a per-driver entry in the BAR to say 'I own this resource').
But this is where adding the e820 resources _after_ doing PCI discovery
comes in. We don't want to clash with PCI discovery per se - we just want
to make sure that later allocations don't allocate over anything that we
either saw earlier (the BAR's we found set up in regular PCI discovery)
_or_ anything that the system has said is reserved (e820 reserved
entries).
Doing it before obviously works too - in fact, it has worked for us for
years. But it does mean that we consider the e820 reserved areas _so_
reserved that we don't allow PCI BAR's in them. Which is apparently a
mistake.
We want to consider them so reserved that we don't add _new_ PCI resources
to them (and perhaps we might even want to stop regular PCI drivers from
attaching to them), but not so exclusive that we don't allow BARs that
have been set up by the BIOS in them.
Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists