lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120822223542.GG8107@redhat.com>
Date:	Thu, 23 Aug 2012 00:35:42 +0200
From:	Andrea Arcangeli <aarcange@...hat.com>
To:	Benjamin Herrenschmidt <benh@...nel.crashing.org>
Cc:	Vaidyanathan Srinivasan <svaidy@...ux.vnet.ibm.com>,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org,
	Hillf Danton <dhillf@...il.com>, Dan Smith <danms@...ibm.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...e.hu>, Paul Turner <pjt@...gle.com>,
	Suresh Siddha <suresh.b.siddha@...el.com>,
	Mike Galbraith <efault@....de>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Lai Jiangshan <laijs@...fujitsu.com>,
	Bharata B Rao <bharata.rao@...il.com>,
	Lee Schermerhorn <Lee.Schermerhorn@...com>,
	Rik van Riel <riel@...hat.com>,
	Johannes Weiner <hannes@...xchg.org>,
	Srivatsa Vaddagiri <vatsa@...ux.vnet.ibm.com>,
	Christoph Lameter <cl@...ux.com>,
	Alex Shi <alex.shi@...el.com>,
	Mauricio Faria de Oliveira <mauricfo@...ux.vnet.ibm.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
	Don Morris <don.morris@...com>
Subject: Re: [PATCH 33/36] autonuma: powerpc port

On Thu, Aug 23, 2012 at 08:01:47AM +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2012-08-22 at 16:59 +0200, Andrea Arcangeli wrote:
> > diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h
> > index 2e0e411..5f03079 100644
> > --- a/arch/powerpc/include/asm/pgtable.h
> > +++ b/arch/powerpc/include/asm/pgtable.h
> > @@ -33,10 +33,56 @@ static inline int pte_dirty(pte_t pte)		{ return pte_val(pte) & _PAGE_DIRTY; }
> >  static inline int pte_young(pte_t pte)		{ return pte_val(pte) & _PAGE_ACCESSED; }
> >  static inline int pte_file(pte_t pte)		{ return pte_val(pte) & _PAGE_FILE; }
> >  static inline int pte_special(pte_t pte)	{ return pte_val(pte) & _PAGE_SPECIAL; }
> > -static inline int pte_present(pte_t pte)	{ return pte_val(pte) & _PAGE_PRESENT; }
> > +static inline int pte_present(pte_t pte)	{ return pte_val(pte) &
> > +							(_PAGE_PRESENT|_PAGE_NUMA_PTE); }
> 
> Is this absolutely necessary ? (testing two bits). It somewhat changes
> the semantics of "pte_present" which I don't really like.

I'm actually surprised you don't already check for PROTNONE
there. Anyway yes this is necessary, the whole concept of NUMA hinting
page faults is to make the pte not present, and to set another bit (be
it a reserved bit or PROTNONE doesn't change anything in that
respect). But another bit replacing _PAGE_PRESENT must exist.

This change is zero cost at runtime, and 0x1 or 0x3 won't change a
thing for the CPU.

> >  static inline int pte_none(pte_t pte)		{ return (pte_val(pte) & ~_PTE_NONE_MASK) == 0; }
> >  static inline pgprot_t pte_pgprot(pte_t pte)	{ return __pgprot(pte_val(pte) & PAGE_PROT_BITS); }
> >  
> > +#ifdef CONFIG_AUTONUMA
> > +static inline int pte_numa(pte_t pte)
> > +{
> > +       return (pte_val(pte) &
> > +               (_PAGE_NUMA_PTE|_PAGE_PRESENT)) == _PAGE_NUMA_PTE;
> > +}
> > +
> > +#endif
> 
> Why the ifdef and not anywhere else ?

The generic version is implemented in asm-generic/pgtable.h to avoid dups.

> > diff --git a/arch/powerpc/include/asm/pte-hash64-64k.h b/arch/powerpc/include/asm/pte-hash64-64k.h
> > index 59247e8..f7e1468 100644
> > --- a/arch/powerpc/include/asm/pte-hash64-64k.h
> > +++ b/arch/powerpc/include/asm/pte-hash64-64k.h
> > @@ -7,6 +7,8 @@
> >  #define _PAGE_COMBO	0x10000000 /* this is a combo 4k page */
> >  #define _PAGE_4K_PFN	0x20000000 /* PFN is for a single 4k page */
> >  
> > +#define _PAGE_NUMA_PTE 0x40000000 /* Adjust PTE_RPN_SHIFT below */
> > +
> >  /* For 64K page, we don't have a separate _PAGE_HASHPTE bit. Instead,
> >   * we set that to be the whole sub-bits mask. The C code will only
> >   * test this, so a multi-bit mask will work. For combo pages, this
> > @@ -36,7 +38,7 @@
> >   * That gives us a max RPN of 34 bits, which means a max of 50 bits
> >   * of addressable physical space, or 46 bits for the special 4k PFNs.
> >   */
> > -#define PTE_RPN_SHIFT	(30)
> > +#define PTE_RPN_SHIFT	(31)
> 
> I'm concerned. We are already running short on RPN bits. We can't spare
> more. If you absolutely need a PTE bit, we'll need to explore ways to
> free some, but just reducing the RPN isn't an option.

No way to do it without a spare bit.

Note that this is now true for sched-numa rewrite as well because it
also introduced the NUMA hinting page faults of AutoNUMA (except what
it does during the fault is different there, but the mechanism of
firing them and the need of a spare pte bit is identical).

But you must have a bit for protnone, don't you? You can implement it
with prot none, I can add the vma as parameter to some function to
achieve it if you need. It may be good idea to do anyway even if
there's no need on x86 at this point.

> Think of what happens if PTE_4K_PFN is set...

It may very well broken with PTE_4K_PFN is set, I'm not familiar with
that. If that's the case we'll just add an option to prevent
AUTONUMA=y to be set if PTE_4K_PFN is set thanks for the info.

> Also you conveniently avoided all the other pte-*.h variants meaning you
> broke the build for everything except ppc64 with 64k pages.

This can only be enabled on PPC64 in KConfig so no problem about
ppc32.

> > diff --git a/mm/autonuma.c b/mm/autonuma.c
> > index ada6c57..a4da3f3 100644
> > --- a/mm/autonuma.c
> > +++ b/mm/autonuma.c
> > @@ -25,7 +25,7 @@ unsigned long autonuma_flags __read_mostly =
> >  #ifdef CONFIG_AUTONUMA_DEFAULT_ENABLED
> >  	|(1<<AUTONUMA_ENABLED_FLAG)
> >  #endif
> > -	|(1<<AUTONUMA_SCAN_PMD_FLAG);
> > +	|(0<<AUTONUMA_SCAN_PMD_FLAG);
> 
> That changes the default accross all architectures, is that ok vs.
> Andrea ?

:) Indeed! But the next patch (34) undoes this hack. I just merged the
patch with "git am" and then introduced a proper way for the arch to
specify if the PMD scan is supported or not in an incremental
patch. Adding ppc64 support, and making the PMD scan mode arch
conditional are two separate things so I thought it was cleaner
keeping those in two separate patches but I can fold them if you
prefer.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ