lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20101219233325.GA2013@helgaas.com>
Date:	Sun, 19 Dec 2010 16:33:25 -0700
From:	Bjorn Helgaas <bjorn.helgaas@...com>
To:	Yinghai Lu <yinghai@...nel.org>
Cc:	Jesse Barnes <jbarnes@...tuousgeek.org>,
	Len Brown <lenb@...nel.org>, linux-pci@...r.kernel.org,
	linux-kernel@...r.kernel.org, linux-acpi@...r.kernel.org,
	"H. Peter Anvin" <hpa@...or.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Ingo Molnar <mingo@...e.hu>, Adam Belay <abelay@....edu>
Subject: Re: [PATCH 8/9] x86: avoid E820 regions when allocating address
 space

On Sun, Dec 19, 2010 at 01:50:50AM -0800, Yinghai Lu wrote:
> On Thu, Dec 16, 2010 at 9:38 AM, Bjorn Helgaas <bjorn.helgaas@...com> wrote:
> >
> > When we allocate address space, e.g., to assign it to a PCI device, don't
> > allocate anything mentioned in the BIOS E820 memory map.
> >
> > On recent machines (2008 and newer), we assign PCI resources from the
> > windows described by the ACPI PCI host bridge _CRS.  On many Dell
> > machines, these windows overlap some E820 reserved areas, e.g.,
> >
> >    BIOS-e820: 00000000bfe4dc00 - 00000000c0000000 (reserved)
> >    pci_root PNP0A03:00: host bridge window [mem 0xbff00000-0xdfffffff]
> >
> > If we put devices at 0xbff00000, they don't work, probably because
> > that's really RAM, not I/O memory.  This patch prevents that by removing
> > the 0xbfe4dc00-0xbfffffff area from the "available" resource.
> >
> > I'm not very happy with this solution because Windows solves the problem
> > differently (it seems to ignore E820 reserved areas and it allocates
> > top-down instead of bottom-up; details at comment 45 of the bugzilla
> > below).  That means we're vulnerable to BIOS defects that Windows would not
> > trip over.  For example, if BIOS described a device in ACPI but didn't
> > mention it in E820, Windows would work fine but Linux would fail.
> >
> > Reference: https://bugzilla.kernel.org/show_bug.cgi?id=16228
> > Signed-off-by: Bjorn Helgaas <bjorn.helgaas@...com>
> > ---
> >
> >  arch/x86/kernel/resource.c |   38 +++++++++++++++++++++++++++++++++++++-
> >  1 files changed, 37 insertions(+), 1 deletions(-)
> >
> >
> > diff --git a/arch/x86/kernel/resource.c b/arch/x86/kernel/resource.c
> > index 407a900..89638af 100644
> > --- a/arch/x86/kernel/resource.c
> > +++ b/arch/x86/kernel/resource.c
> > @@ -1,11 +1,47 @@
> >  #include <linux/ioport.h>
> >  #include <asm/e820.h>
> >
> > +static void resource_clip(struct resource *res, resource_size_t start,
> > +                         resource_size_t end)
> > +{
> > +       resource_size_t low = 0, high = 0;
> > +
> > +       if (res->end < start || res->start > end)
> > +               return;         /* no conflict */
> > +
> > +       if (res->start < start)
> > +               low = start - res->start;
> > +
> > +       if (res->end > end)
> > +               high = res->end - end;
> > +
> > +       /* Keep the area above or below the conflict, whichever is larger */
> > +       if (low > high)
> > +               res->end = start - 1;
> > +       else
> > +               res->start = end + 1;
> > +}
> > +
> > +static void remove_e820_regions(struct resource *avail)
> > +{
> > +       int i;
> > +       struct e820entry *entry;
> > +
> > +       for (i = 0; i < e820.nr_map; i++) {
> > +               entry = &e820.map[i];
> > +
> > +               resource_clip(avail, entry->addr,
> > +                             entry->addr + entry->size - 1);
> > +       }
> > +}
> > +
> >  void arch_remove_reservations(struct resource *avail)
> >  {
> > -       /* Trim out BIOS area (low 1MB) */
> > +       /* Trim out BIOS area (low 1MB) and E820 regions */
> >        if (avail->flags & IORESOURCE_MEM) {
> >                if (avail->start < BIOS_END)
> >                        avail->start = BIOS_END;
> > +
> > +               remove_e820_regions(avail);
> >        }
> >  }
> 
> that looks expensive. it will keep going through e820 tables...

It's expensive when we do it, but we do it very rarely, i.e., only
when we call allocate_resource().

> but e820 should have been reserved in resource tree...

E820 regions don't fit very well in the resource tree.  The
tree normally contains devices, which are mutually exclusive
and fit nicely in a hierarchy.

The E820 RAM regions fit that description (they can't conflict
with anything else).  But generic "reserved" regions do not.
A reserved region might cover several devices, it might cover
part of a device, it might cover a piece of RAM, or it might
not be related to a device at all.  We do try to wedge those
reservations into the resource tree, but I think that's a
mistake because it doesn't work very well.

In this case, the E820 reservation *is* in the resource tree,
but we took the reasonable 0xbfe4dc00 - 0xc0000000 E820 entry
and turned it into the bogus 0xbfe4dc00-0xf7ffffff range:

  bfe4dc00-f7ffffff reserved (expanded)
    bff00000-f7ffffff PCI Bus 0000:00

More details here:

  https://bugzilla.kernel.org/show_bug.cgi?id=16228#c45

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ