[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080703050036.GD14614@csn.ul.ie>
Date: Thu, 3 Jul 2008 06:00:36 +0100
From: Mel Gorman <mel@....ul.ie>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Dan Williams <dan.j.williams@...el.com>,
Andy Whitcroft <apw@...dowen.org>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, NeilBrown <neilb@...e.de>,
babydr@...y-dragons.com, cl@...ux-foundation.org,
lee.schermerhorn@...com, a.beregalov@...il.com,
akpm@...ux-foundation.org
Subject: Re: [problem] raid performance loss with 2.6.26-rc8 on 32-bit x86 (bisected)
On (02/07/08 21:43), Linus Torvalds didst pronounce:
>
>
> On Thu, 3 Jul 2008, Mel Gorman wrote:
>
> > diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.26-rc8-clean/mm/page_alloc.c linux-2.6.26-rc8-fix-kswapd-on-numa/mm/page_alloc.c
> > --- linux-2.6.26-rc8-clean/mm/page_alloc.c 2008-06-24 18:58:20.000000000 -0700
> > +++ linux-2.6.26-rc8-fix-kswapd-on-numa/mm/page_alloc.c 2008-07-02 21:14:16.000000000 -0700
> > @@ -2328,7 +2328,8 @@ static void build_zonelists(pg_data_t *p
> > static void build_zonelist_cache(pg_data_t *pgdat)
> > {
> > pgdat->node_zonelists[0].zlcache_ptr = NULL;
> > - pgdat->node_zonelists[1].zlcache_ptr = NULL;
> > + if (NUMA_BUILD)
> > + pgdat->node_zonelists[1].zlcache_ptr = NULL;
> > }
>
> This makes no sense.
>
> That whole thing is inside a
>
> #ifdef CONFIG_NUMA
> ... numa code ..
> #else
> ... this code ..
> #endif
>
> so CONFIG_NUMA will _not_ be set, and NUMA_BUILD is always 0.
>
> So why do that
>
> if (NUMA_BUILD)
> ..
>
> at all, when it is known to be false?
>
Because I'm a muppet and a bit cross-eyed from looking at this until the
problem would reveal itself. I knew this is needed fixing but choose the
most stupid way possible to fix it.
> So the patch may be correct, but wouldn't it be better to just remove the
> line entirely, instead of moving it into a conditional that cannot be
> true?
>
Yes, revised patch below. Same fix, more sensible.
> Also, I'm not quite seeing why those zonelists should be zeroed out at
> all. Shouldn't a non-NUMA setup always aim to have node_zonelists[0] ==
> node_zonelists[1] == all appropriate zones?
>
node_zonelists[1] doesn't exist at all on non-NUMA so there is only
node_zonelists[0] that is all appropriate zones.
> I have to say, the whole mmzoen thing is confusing. The code makes my eyes
> bleed. I can't really follow it.
I'm looking at this too long to comment with anything other than a blank
stare :/
====
Subject: [PATCH] Do not overwrite nr_zones on !NUMA when initialising zlcache_ptr
With the two-zonelist patches on !NUMA machines, there really is only one
zonelist as __GFP_THISNODE is meaningless. However, during initialisation, the
assumption is made that two zonelists exist when initialising zlcache_ptr. The
result is that pgdat->nr_zones is always 0. As kswapd uses this value to
determine what reclaim work is necessary, the result is that kswapd never
reclaims. This causes processes to stall frequently in low-memory situations
as they always direct reclaim. This patch initialises zlcache_ptr correctly.
Signed-off-by: Mel Gorman <mel@....ul.ie>
---
page_alloc.c | 1 -
1 file changed, 1 deletion(-)
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.26-rc8-clean/mm/page_alloc.c linux-2.6.26-rc8-fix-kswapd-on-numa/mm/page_alloc.c
--- linux-2.6.26-rc8-clean/mm/page_alloc.c 2008-06-24 18:58:20.000000000 -0700
+++ linux-2.6.26-rc8-fix-kswapd-on-numa/mm/page_alloc.c 2008-07-02 21:49:09.000000000 -0700
@@ -2328,7 +2328,6 @@ static void build_zonelists(pg_data_t *p
static void build_zonelist_cache(pg_data_t *pgdat)
{
pgdat->node_zonelists[0].zlcache_ptr = NULL;
- pgdat->node_zonelists[1].zlcache_ptr = NULL;
}
#endif /* CONFIG_NUMA */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists