lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1425486216.17007.236.camel@misato.fc.hp.com>
Date:	Wed, 04 Mar 2015 09:23:36 -0700
From:	Toshi Kani <toshi.kani@...com>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	"hpa@...or.com" <hpa@...or.com>,
	"tglx@...utronix.de" <tglx@...utronix.de>,
	"mingo@...hat.com" <mingo@...hat.com>,
	"arnd@...db.de" <arnd@...db.de>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	"x86@...nel.org" <x86@...nel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"dave.hansen@...el.com" <dave.hansen@...el.com>,
	"Elliott, Robert (Server Storage)" <Elliott@...com>
Subject: Re: [PATCH v3 6/6] x86, mm: Support huge KVA mappings on x86

On Wed, 2015-03-04 at 01:00 +0000, Andrew Morton wrote:
> On Tue, 03 Mar 2015 16:14:32 -0700 Toshi Kani <toshi.kani@...com> wrote:
> 
> > On Tue, 2015-03-03 at 14:44 -0800, Andrew Morton wrote:
> > > On Tue,  3 Mar 2015 10:44:24 -0700 Toshi Kani <toshi.kani@...com> wrote:
> >  :
> > > > +
> > > > +#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
> > > > +int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
> > > > +{
> > > > +	u8 mtrr;
> > > > +
> > > > +	/*
> > > > +	 * Do not use a huge page when the range is covered by non-WB type
> > > > +	 * of MTRRs.
> > > > +	 */
> > > > +	mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE);
> > > > +	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != 0xFF))
> > > > +		return 0;
> > > 
> > > It would be good to notify the operator in some way when this happens. 
> > > Otherwise the kernel will run more slowly and there's no way of knowing
> > > why.  I guess slap a pr_info() in there.  Or maybe pr_warn()?
> > 
> > We only use 4KB mappings today, so this case will not make it run
> > slowly, i.e. it will be the same as today.
> 
> Yes, but it would be slower than it would be if the operator fixed the
> mtrr settings!  How do we let the operator know this?
> 
> >  Also, adding a message here
> > can generate a lot of messages when MTRRs cover a large area.
> 
> Really?  This is only going to happen when a device driver requests a
> huge io mapping, isn't it?  That's rare.  We could emit a warning,
> return an error code and fall all the way back to the top-level ioremap
> code which can then retry with 4k mappings.  Or something similar -
> somehow record the fact that this warning has been emitted or use
> printk ratelimiting (bad option).

Yes, an IO device with a huge MMIO space that is covered by MTRRs is a
rare case.  BIOS does not need to specify how MMIO of each card needs to
be accessed with MTRRs (or BIOS should not do it since an MMIO address
is configurable on each card).

However, PCIe has the MMCONFIG space, PCIe config space, which is also
memory mapped and must be accessed with UC.  The PCI subsystem calls
ioremap_nocache() to map the entire MMCONFIG space, which covers the
PCIe config space of all possible cards.  Here are boot messages on my
test system.

  :
PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0xc0000000-0xcf
ffffff] (base 0xc0000000)
PCI: MMCONFIG at [mem 0xc0000000-0xcfffffff] reserved in E820
  :

And MTRRs cover this MMCONFIG space with UC to assure that the range is
always accessed with UC.

# cat /proc/mtrr
reg00: base=0x0c0000000 ( 3072MB), size= 1024MB, count=1: uncachable

So, if we add a message into the code, it will be displayed many times
in this ioremap_nocache() call from PCI.

Ideally, pud_set_huge() and pmd_set_huge() should allow using a huge
page mapping when the entire map range is covered by a single MTRR
entry, which is the case with MMCONFIG.  But I did not include such
handling into the patch because UC map is slow by itself, MMCONFIG is
only accessed at boot-time, and mtrr_type_lookup() does not provide the
level of info necessary.

Thanks,
-Toshi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ