[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <1225200000.6482.4.camel@mikevs-laptop>
Date: Tue, 28 Oct 2008 17:20:00 +0400
From: Miquel van Smoorenburg <mikevs@...all.net>
To: Dave Chinner <david@...morbit.com>
Cc: linux-kernel@...r.kernel.org
Subject: Re: Order 0 page allocation failure under heavy I/O load
On Mon, 2008-10-27 at 09:57 +1100, Dave Chinner wrote:
> I've been running a workload in a UML recently to reproduce a
> problem, and I've been seeing all sorts of latency problems on
> the host. The hosts is running a standard debian kernel:
>
> $ uname -a
> Linux disturbed 2.6.26-1-amd64 #1 SMP Wed Sep 10 15:31:12 UTC 2008 x86_64 GNU/Linux
>
> Basically, the workload running in the UML is:
>
> # fsstress -p 1024 -n 100000 -d /mnt/xfs2/fsstress.dir
>
> Which runs 1024 fsstress processes inside the indicated directory.
> Being UML, that translates to 1024 processes on the host doing I/O
> to a single file in an XFS filesystem. The problem is that this
> load appears to be triggering OOM on the host. The host filesystem
> is XFS on a 2 disk MD raid0 stripe.
>
> The host will hang for tens of seconds at a time with both CPU cores
> pegged at 100%, and eventually I get this in dmesg:
>
> [1304740.261506] linux: page allocation failure. order:0, mode:0x10000
> [1304740.261516] Pid: 10705, comm: linux Tainted: P 2.6.26-1-amd64 #1
> [1304740.261520]
> [1304740.261520] Call Trace:
> [1304740.261557] [<ffffffff802768db>] __alloc_pages_internal+0x3ab/0x3c4
> [1304740.261574] [<ffffffff80295248>] kmem_getpages+0x96/0x15f
I saw the same thing, on i386 though. Never saw it on x86_64. For i386
it helped to recompile with the 2G/2G split set. But it appears that my
problem has been solved in 2.6.26.6 by the commit below. Perhaps your
hitting something similar. Your kernel version looks like a debian
version number, and if 2.6.26.6 fixes your problem, please file a debian
bug report so that lenny won't get released with this bug ....
commit 6b546b3dbbc51800bdbd075da923288c6a4fe5af
Author: Mel Gorman <mel@....ul.ie>
Date: Sat Sep 13 22:05:39 2008 +0000
mm: mark the correct zone as full when scanning zonelists
commit 5bead2a0680687b9576d57c177988e8aa082b922 upstream
The iterator for_each_zone_zonelist() uses a struct zoneref *z cursor when
scanning zonelists to keep track of where in the zonelist it is. The
zoneref that is returned corresponds to the the next zone that is to be
scanned, not the current one. It was intended to be treated as an opaque
list.
When the page allocator is scanning a zonelist, it marks elements in the
zonelist corresponding to zones that are temporarily full. As the
zonelist is being updated, it uses the cursor here;
if (NUMA_BUILD)
zlc_mark_zone_full(zonelist, z);
This is intended to prevent rescanning in the near future but the zoneref
cursor does not correspond to the zone that has been found to be full.
This is an easy misunderstanding to make so this patch corrects the
problem by changing zoneref cursor to be the current zone being scanned
instead of the next one.
Mike.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists