[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1303311779.2587.19.camel@mulgrave.site>
Date: Wed, 20 Apr 2011 10:02:59 -0500
From: James Bottomley <James.Bottomley@...senPartnership.com>
To: Christoph Lameter <cl@...ux.com>
Cc: Pekka Enberg <penberg@...nel.org>, Matthew Wilcox <matthew@....cx>,
KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
Michal Hocko <mhocko@...e.cz>,
Andrew Morton <akpm@...ux-foundation.org>,
Hugh Dickins <hughd@...gle.com>, linux-mm@...ck.org,
LKML <linux-kernel@...r.kernel.org>,
linux-parisc@...r.kernel.org, David Rientjes <rientjes@...gle.com>,
Ingo Molnar <mingo@...e.hu>, x86 maintainers <x86@...nel.org>,
linux-arch@...r.kernel.org, Mel Gorman <mel@....ul.ie>
Subject: Re: [PATCH v3] mm: make expand_downwards symmetrical to
expand_upwards
On Wed, 2011-04-20 at 09:50 -0500, Christoph Lameter wrote:
> On Wed, 20 Apr 2011, James Bottomley wrote:
>
> > 1. We can look at what imposing NUMA on the DISCONTIGMEM archs
> > would do ... the embedded ones are going to be hardest hit, but
> > if it's not too much extra code, it might be palatable.
> > 2. The other is that we can audit mm to look at all the node
> > assumptions in the non-numa case. My suspicion is that
> > accidentally or otherwise, it mostly works for the normal case,
> > so there might not be much needed to pull it back to working
> > properly for DISCONTIGMEM.
>
> The older code may work. SLAB f.e. does not call page_to_nid() in the
> !NUMA case but keeps special metadata structures around in each slab page
> that records the node used for allocation. The problem is with new code
> added/revised in the last 5 years or so that uses page_to_nid() and
> allocates only a single structure for !NUMA. There are also VM_BUG_ONs in
> the page allocator that should trigger if page_to_nid() returns strange
> values. I wonder why that never occurred.
Actually, I think slab got changed when discontigmem was added ...
that's why it all works OK.
> > 3. Finally we could look at deprecating DISCONTIGMEM in favour
> of > SPARSEMEM, but we'd still need to fix -stable for that case.
> > Especially as it will take time to convert all the architectures
>
> The fix needed is to mark DISCONTIGMEM without NUMA as broken for now. We
> need an audit of the core VM before removing that or making it contingent
> on the configurations of various VM subsystems.
Don't be stupid ... that would cause six architectures to get marked
broken.
> > I'm certainly with Matthew: DISCONTIGMEM is supposed to be a lightweight
> > framework which allows machines with split physical memory ranges to
> > work. That's a very common case nowadays. Numa is supposed to be a
> > heavyweight framework to preserve node locality for non-uniform memory
> > access boxes (which none of the DISCONTIGMEM && !NUMA systems are).
>
> Well yes but we have SPARSE for that today. DISCONTIG with multiple per
> pgdat structures in a !NUMA case is just weird and unexpected for many who
> have done VM coding in the last years.
Look, I'm not really interested in who understands what. The fact is we
have six architectures with the possibility for DISCONTIGMEM && !NUMA,
so that's the case we need to fix in -stable.
They oops with SLUB, as far as I can tell, there are still no oops
reports with SLAB. The simplest -stable fix seems to be to mark SLUB
broken on DISCONTIGMEM && !NUMA.
James
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists