[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1303242580.11237.10.camel@mulgrave.site>
Date: Tue, 19 Apr 2011 14:49:40 -0500
From: James Bottomley <James.Bottomley@...senPartnership.com>
To: Christoph Lameter <cl@...ux.com>
Cc: Pekka Enberg <penberg@...nel.org>, Michal Hocko <mhocko@...e.cz>,
Andrew Morton <akpm@...ux-foundation.org>,
Hugh Dickins <hughd@...gle.com>, linux-mm@...ck.org,
LKML <linux-kernel@...r.kernel.org>,
linux-parisc@...r.kernel.org, David Rientjes <rientjes@...gle.com>
Subject: Re: [PATCH v3] mm: make expand_downwards symmetrical to
expand_upwards
On Tue, 2011-04-19 at 13:35 -0500, Christoph Lameter wrote:
> On Tue, 19 Apr 2011, James Bottomley wrote:
>
> > > }
> > >
> > > How in the world did you get a zone setup in node 1 with a !NUMA config?
> >
> > I told you ... I forced an allocation into the first discontiguous
> > region. That will return 1 for page_to_nid().
>
> How? The kernel has no concept of a node 1 without CONFIG_NUMA and so you
> cannot tell the page allocator to allocate from node 1.
Yes, it does, as I explained in the email.
> zone_to_nid is used as a fallback mechanism for page_to_nid() and as shown
> will always return 0 for !NUMA configs.
>
> page_to_nid(x) == zone_to_nid(page_zone(x)) must hold true. It is not
> here.
>
> > > The problem seems to be that the kernel seems to allow a
> > > definition of a page_to_nid() function that returns non zero in the !NUMA
> > > case.
> >
> > This is called reality, yes.
>
> There you have the bug. Fix that and things will work fine.
Why don't yout file the bug against reality? I'm not sure I have enough
credibility ...
> > right, that's what I told you: slub is broken because it's making a
> > wrong assumption. Look in asm-generic/memory_model.h it shows how the
> > page_to_nid() is used in finding the pfn array. DISCONTIGMEM uses some
> > of the numa properties (including assigning zones to the discontiguous
> > regions).
>
> Bitrotted code?
Don't be silly: alpha, ia64, m32r, m68k, mips, parisc, tile and even x86
all use the discontigmem memory model in some configurations.
> If it uses numa properties then it must use a zone field
> in struct zone. So DISCONTIGMEM seems to require CONFIG_NUMA.
No ... you're giving me back your assumptions. They're not based on
what the kernel does. CONFIG_NUMA may or may not be defined with
CONFIG_DISCONTIGMEM.
Of all the above, only x86 always had NUMA with DISCONTIGMEM.
> > > If you think that is broken then we have brokenness all over the kernel
> > > whenever we determine the node from a page and use that to do a lookup.
> >
> > Not really. The rest of the kernel uses the proper macros. in
> > DISCONTIGMEM but !NUMA configs, the numa macros expand correctly.
> > You've cut across that with all the CONFIG_NUMA checks in slub.
>
> What are "the proper macros"? AFAICT page_to_nid() is the proper way to
> access the node of a page. If page_to_nid() returns 1 then you have a zone
> that the kernel knows of as being in node 0 having a page on a different
> node.
Well it depends what you want. If you only want the actual NUMA node,
then pfn_to_nid() probably isn't what you want, because in a
DISCONTIGMEM model, there may be multiple nids per actual numa node.
> We can likely force page_to_nid to ignore the node information that have
> been erroneously placed there but this looks like something deeper is
> wrong here. The node field in struct page is not only used for the Linux
> support of a NUMA node but also for blocks of memory. Those should be
> separate things.
Look, it's not wrong, it's by design. The assumption that non-numa
systems don't use nodes is the wrong one.
> ---
> include/linux/mm.h | 4 ++++
> 1 file changed, 4 insertions(+)
>
> Index: linux-2.6/include/linux/mm.h
> ===================================================================
> --- linux-2.6.orig/include/linux/mm.h 2011-04-19 13:20:20.092521248 -0500
> +++ linux-2.6/include/linux/mm.h 2011-04-19 13:21:05.962521196 -0500
> @@ -665,6 +665,7 @@ static inline int zone_to_nid(struct zon
> #endif
> }
>
> +#ifdef CONFIG_NUMA
> #ifdef NODE_NOT_IN_PAGE_FLAGS
> extern int page_to_nid(struct page *page);
> #else
> @@ -673,6 +674,9 @@ static inline int page_to_nid(struct pag
> return (page->flags >> NODES_PGSHIFT) & NODES_MASK;
> }
> #endif
> +#else
> +#define page_to_nid(x) 0
> +#endif
Don't be silly ... that breaks asm-generic/memory_model.h
James
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists