linux-kernel - Re: [PATCH 0/8] avoid allocation in show_numa

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20110504161020.e2d0a7f2.akpm@linux-foundation.org>
Date:	Wed, 4 May 2011 16:10:20 -0700
From:	Andrew Morton <akpm@...ux-foundation.org>
To:	Stephen Wilson <wilsons@...rt.ca>
Cc:	Alexander Viro <viro@...iv.linux.org.uk>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	Hugh Dickins <hughd@...gle.com>,
	David Rientjes <rientjes@...gle.com>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org, Jeremy Fitzhardinge <jeremy@...p.org>
Subject: Re: [PATCH 0/8] avoid allocation in show_numa_map()

On Wed, 27 Apr 2011 19:35:41 -0400
Stephen Wilson <wilsons@...rt.ca> wrote:

> Recently a concern was raised[1] that performing an allocation while holding a
> reference on a tasks mm could lead to a stalemate in the oom killer.  The
> concern was specific to the goings-on in /proc.  Hugh Dickins stated the issue
> thusly:
> 
>     ...imagine what happens if the system is out of memory, and the mm
>     we're looking at is selected for killing by the OOM killer: while we
>     wait in __get_free_page for more memory, no memory is freed from the
>     selected mm because it cannot reach exit_mmap while we hold that
>     reference.
> 
> The primary goal of this series is to eliminate repeated allocation/free cycles
> currently happening in show_numa_maps() while we hold a reference to an mm.
> 
> The strategy is to perform the allocation once when /proc/pid/numa_maps is
> opened, before a reference on the target tasks mm is taken.
> 
> Unfortunately, show_numa_maps() is implemented in mm/mempolicy.c while the
> primary procfs implementation  lives in fs/proc/task_mmu.c.  This makes
> clean cooperation between show_numa_maps() and the other seq_file operations
> (start(), stop(), etc) difficult.
> 
> 
> Patches 1-5 convert show_numa_maps() to use the generic walk_page_range()
> functionality instead of the mempolicy.c specific page table walking logic.
> Also, get_vma_policy() is exported.  This makes the show_numa_maps()
> implementation independent of mempolicy.c. 
> 
> Patch 6 moves show_numa_maps() and supporting routines over to
> fs/proc/task_mmu.c.
> 
> Finally, patches 7 and 8 provide minor cleanup and eliminates the troublesome
> allocation.
> 
>  
> Please note that moving show_numa_maps() into fs/proc/task_mmu.c essentially
> reverts 1a75a6c825 and 48fce3429d.  Also, please see the discussion at [2].  My
> main justifications for moving the code back into task_mmu.c is:
> 
>   - Having the show() operation "miles away" from the corresponding
>     seq_file iteration operations is a maintenance burden. 
>     
>   - The need to export ad hoc info like struct proc_maps_private is
>     eliminated.
> 
> 
> These patches are based on v2.6.39-rc5.

The patches look reasonable.  It would be nice to get some more review
happening (poke).

> 
> Please note that this series is VERY LIGHTLY TESTED.  I have been using
> CONFIG_NUMA_EMU=y thus far as I will not have access to a real NUMA system for
> another week or two.

"lightly tested" evokes fear, but the patches don't look too scary to
me.

Did you look at using apply_to_page_range()?

I'm trying to remember why we're carrying both walk_page_range() and
apply_to_page_range() but can't immediately think of a reason.

There's also an apply_to_page_range_batch() in -mm but that code is
broken on PPC and not much is happening with it, so it will probably go
away again.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/