[<prev] [next>] [day] [month] [year] [list]
Message-ID: <4BDFFCC4.5000106@cn.fujitsu.com>
Date: Tue, 04 May 2010 18:53:56 +0800
From: Miao Xie <miaox@...fujitsu.com>
To: David Rientjes <rientjes@...gle.com>,
Nick Piggin <npiggin@...e.de>, Paul Menage <menage@...gle.com>,
Lee Schermerhorn <lee.schermerhorn@...com>
CC: Andrew Morton <akpm@...ux-foundation.org>,
Linux-Kernel <linux-kernel@...r.kernel.org>,
Linux-MM <linux-mm@...ck.org>
Subject: [PATCH -V2 0/2] fix oom happening when changing cpuset'mems(was:
[regression] cpuset,mm: update tasks' mems_allowed in time (58568d2))
Nick Piggin reported that the allocator may see an empty nodemask when changing
cpuset's mems[1]. It happens only on the kernel that do not do atomic nodemask_t
stores. (MAX_NUMNODES > BITS_PER_LONG)
But I found that there is also a problem on the kernel that can do atomic
nodemask_t stores. The problem is that the allocator can't find a node to
alloc page when changing cpuset's mems though there is a lot of free memory.
The reason is like this:
(mpol: mempolicy)
task1 task1's mpol task2
alloc page 1
alloc on node0? NO 1
1 change mems from 1 to 0
1 rebind task1's mpol
0-1 set new bits
0 clear disallowed bits
alloc on node1? NO 0
...
can't alloc page
goto oom
I can use the attached program reproduce it by the following step:
# mkdir /dev/cpuset
# mount -t cpuset cpuset /dev/cpuset
# mkdir /dev/cpuset/1
# echo `cat /dev/cpuset/cpus` > /dev/cpuset/1/cpus
# echo `cat /dev/cpuset/mems` > /dev/cpuset/1/mems
# echo $$ > /dev/cpuset/1/tasks
# numactl --membind=`cat /dev/cpuset/mems` ./cpuset_mem_hog <nr_tasks> &
<nr_tasks> = max(nr_cpus - 1, 1)
# killall -s SIGUSR1 cpuset_mem_hog
# ./change_mems.sh
several hours later, oom will happen though there is a lot of free memory.
This patchset fixes this problem by expanding the nodes range first(set newly
allowed bits) and shrink it lazily(clear newly disallowed bits). So we use a
variable to tell the write-side task that read-side task is reading nodemask,
and the write-side task clears newly disallowed nodes after read-side task ends
the current memory allocation.
Changelog since V1:
- restructure the mempolicy's rebind functions, and split the rebind work to
two steps because the rebind functions may breaks the first step - expanding
the nodes range.
Thanks
Miao
[1] http://lkml.org/lkml/2010/2/18/111
[PATCH 1/2] mempolicy: restructure rebinding-mempolicy functions
[PATCH 2/2] cpuset,mm: fix no node to alloc memory when changing cpuset's mems
Download attachment "reproduce_prog.tar.gz" of type "application/gzip" (1190 bytes)
Powered by blists - more mailing lists