lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <4BDFFCC4.5000106@cn.fujitsu.com>
Date:	Tue, 04 May 2010 18:53:56 +0800
From:	Miao Xie <miaox@...fujitsu.com>
To:	David Rientjes <rientjes@...gle.com>,
	Nick Piggin <npiggin@...e.de>, Paul Menage <menage@...gle.com>,
	Lee Schermerhorn <lee.schermerhorn@...com>
CC:	Andrew Morton <akpm@...ux-foundation.org>,
	Linux-Kernel <linux-kernel@...r.kernel.org>,
	Linux-MM <linux-mm@...ck.org>
Subject: [PATCH -V2 0/2] fix oom happening when changing cpuset'mems(was:
 [regression] cpuset,mm: update tasks' mems_allowed in time (58568d2))

Nick Piggin reported that the allocator may see an empty nodemask when changing
cpuset's mems[1]. It happens only on the kernel that do not do atomic nodemask_t
stores. (MAX_NUMNODES > BITS_PER_LONG)

But I found that there is also a problem on the kernel that can do atomic
nodemask_t stores. The problem is that the allocator can't find a node to
alloc page when changing cpuset's mems though there is a lot of free memory.
The reason is like this:
(mpol: mempolicy)
	task1			task1's mpol	task2
	alloc page		1
	  alloc on node0? NO	1
				1		change mems from 1 to 0
				1		rebind task1's mpol
				0-1		  set new bits
				0	  	  clear disallowed bits
	  alloc on node1? NO	0
	  ...
	can't alloc page
	  goto oom

I can use the attached program reproduce it by the following step:
# mkdir /dev/cpuset
# mount -t cpuset cpuset /dev/cpuset
# mkdir /dev/cpuset/1
# echo `cat /dev/cpuset/cpus` > /dev/cpuset/1/cpus
# echo `cat /dev/cpuset/mems` > /dev/cpuset/1/mems
# echo $$ > /dev/cpuset/1/tasks
# numactl --membind=`cat /dev/cpuset/mems` ./cpuset_mem_hog <nr_tasks> &
   <nr_tasks> = max(nr_cpus - 1, 1)
# killall -s SIGUSR1 cpuset_mem_hog
# ./change_mems.sh

several hours later, oom will happen though there is a lot of free memory.

This patchset fixes this problem by expanding the nodes range first(set newly
allowed bits) and shrink it lazily(clear newly disallowed bits). So we use a
variable to tell the write-side task that read-side task is reading nodemask,
and the write-side task clears newly disallowed nodes after read-side task ends
the current memory allocation.

Changelog since V1:
- restructure the mempolicy's rebind functions, and split the rebind work to
  two steps because the rebind functions may breaks the first step - expanding
  the nodes range.

Thanks
Miao

[1] http://lkml.org/lkml/2010/2/18/111

[PATCH 1/2] mempolicy: restructure rebinding-mempolicy functions
[PATCH 2/2] cpuset,mm: fix no node to alloc memory when changing cpuset's mems

Download attachment "reproduce_prog.tar.gz" of type "application/gzip" (1190 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ