lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.1.10.0808292216310.3290@nehalem.linux-foundation.org>
Date:	Fri, 29 Aug 2008 22:52:40 -0700 (PDT)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Yinghai Lu <yhlu.kernel@...il.com>
cc:	"Rafael J. Wysocki" <rjw@...k.pl>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Jeff Garzik <jeff@...zik.org>, Tejun Heo <htejun@...il.com>,
	Ingo Molnar <mingo@...e.hu>,
	David Witbrodt <dawitbro@...global.net>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Kernel Testers <kernel-testers@...r.kernel.org>
Subject: Re: Linux 2.6.27-rc5: System boot regression caused by commit
 a2bd7274b47124d2fc4dfdb8c0591f545ba749dd



On Fri, 29 Aug 2008, Yinghai Lu wrote:
> 
> if we don't add the IORESOURCE_BUSY, why bother to add these entries...

You don't understand how the resource allocator works.

IORESOURCE_BUSY is really more of a "legacy bit". It has almost no bearing 
on the actual allocations.

Just grep for IORSOURCE_BUSY in kernel/resource.c. The _only_ thing that 
cares about busy/non-busy is the legact "request_region()" function. That 
one isn't actually used by any core PCI code - it's more of a driver 
issue to claim exclusive ownership of particular resources by inserting a 
marker in that resource.

So IORESOURCE_BUSY is a red herring. The only reason I said you can clear 
it is because you claimed it causes problems, but the more I look at it, 
the more I think you're likely just mistaken - because IORESOURCE_BUSY 
doesn't make any difference at all to normal resource handling until you 
get to actual drivers.

The bigger issue is that just inserting the resource (and it really 
doesn't matter if it is marked busy or not) is in itself a mark of 
"there's something here". THAT is what all the resource code cares about. 
The IORESOURCE_BUSY bit is almost immaterial (ie _is_ immaterial except 
for some very specific cases).

And the reason we need to add the e820 resources is exactly so that we 
don't try to allocate PCI resources on top of some system resources we 
don't even know about!

> good layout from BIOS, it should only reserve mmio range is not showing in BAR.

I agree, but "good layour" and "BIOS" don't really go together. There's 
too many broken BIOSes.

> if one stupid BIOS set
> 0xdc000000 - 0x100000000 for reserved.
> 
> then when in insert that range late

Sure, but really, the only point of even caring about e820 resources in 
the first place has really nothing to do with the BAR's we can see 
(because the kernel can handle _those_ perfectly well on its own), and has 
everything to do with teh fact that a lot of devices have invisible 
resources that we _cannot_ see (ie magic non-standard BAR's for the 
motherboard chips).

And those are exactly why we want to populate the resource map with the 
e820 information - to avoid having dynamic resources (like Cardbus or PCI 
hotplug, or just devices that weren't set up statically by the BIOS) be 
then allocated by the kernel on top of those "invisible" resources.

And the dynamic code actually doesn't care about IORESOURCE_BUSY at all: 
it will avoid _any_ resource it can see. Think about it: it has to - since 
existing PCI resources we have set up will _not_ have that IORESOURCE_BUSY 
set.

In many ways, IORESOURCE_BUSY is pure legacy stuff, and is meant for "this 
is a black hole and you must not look into it at all". It originates with 
a need to originally having to lock drivers away from other drives by 
marking their resources busy - in an ISA world, where there are no other 
ways of saying "I own this device".

(Yeah, yeah, PCI drivers do the same thing too - they mark their BAR's by 
inserting a per-driver entry in the BAR to say 'I own this resource').

But this is where adding the e820 resources _after_ doing PCI discovery 
comes in. We don't want to clash with PCI discovery per se - we just want 
to make sure that later allocations don't allocate over anything that we 
either saw earlier (the BAR's we found set up in regular PCI discovery) 
_or_ anything that the system has said is reserved (e820 reserved 
entries).

Doing it before obviously works too - in fact, it has worked for us for 
years. But it does mean that we consider the e820 reserved areas _so_ 
reserved that we don't allow PCI BAR's in them. Which is apparently a 
mistake.

We want to consider them so reserved that we don't add _new_ PCI resources 
to them (and perhaps we might even want to stop regular PCI drivers from 
attaching to them), but not so exclusive that we don't allow BARs that 
have been set up by the BIOS in them.

			Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ