lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sun, 17 May 2009 09:38:30 +0900
From:	Minchan Kim <minchan.kim@...il.com>
To:	Wu Fengguang <fengguang.wu@...el.com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Elladan <elladan@...imo.com>, Nick Piggin <npiggin@...e.de>,
	Johannes Weiner <hannes@...xchg.org>,
	Christoph Lameter <cl@...ux-foundation.org>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Rik van Riel <riel@...hat.com>,
	"tytso@....edu" <tytso@....edu>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>
Subject: Re: [PATCH 2/3] vmscan: make mapped executable pages the first class 
	citizen

On Sat, May 16, 2009 at 6:28 PM, Wu Fengguang <fengguang.wu@...el.com> wrote:
> [trivial update on comment text, according to Rik's comment]
>
> --
> vmscan: make mapped executable pages the first class citizen
>
> Protect referenced PROT_EXEC mapped pages from being deactivated.
>
> PROT_EXEC(or its internal presentation VM_EXEC) pages normally belong to some
> currently running executables and their linked libraries, they shall really be
> cached aggressively to provide good user experiences.
>
> Thanks to Johannes Weiner for the advice to reuse the VMA walk in
> page_referenced() to get the PROT_EXEC bit.
>
>
> [more details]
>
> ( The consequences of this patch will have to be discussed together with
>  Rik van Riel's recent patch "vmscan: evict use-once pages first". )
>
> ( Some of the good points and insights are taken into this changelog.
>  Thanks to all the involved people for the great LKML discussions. )
>
> the problem
> -----------
>
> For a typical desktop, the most precious working set is composed of
> *actively accessed*
>        (1) memory mapped executables
>        (2) and their anonymous pages
>        (3) and other files
>        (4) and the dcache/icache/.. slabs
> while the least important data are
>        (5) infrequently used or use-once files
>
> For a typical desktop, one major problem is busty and large amount of (5)
> use-once files flushing out the working set.
>
> Inside the working set, (4) dcache/icache have already been too sticky ;-)
> So we only have to care (2) anonymous and (1)(3) file pages.
>
> anonymous pages
> ---------------
> Anonymous pages are effectively immune to the streaming IO attack, because we
> now have separate file/anon LRU lists. When the use-once files crowd into the
> file LRU, the list's "quality" is significantly lowered. Therefore the scan
> balance policy in get_scan_ratio() will choose to scan the (low quality) file
> LRU much more frequently than the anon LRU.
>
> file pages
> ----------
> Rik proposed to *not* scan the active file LRU when the inactive list grows
> larger than active list. This guarantees that when there are use-once streaming
> IO, and the working set is not too large(so that active_size < inactive_size),
> the active file LRU will *not* be scanned at all. So the not-too-large working
> set can be well protected.
>
> But there are also situations where the file working set is a bit large so that
> (active_size >= inactive_size), or the streaming IOs are not purely use-once.
> In these cases, the active list will be scanned slowly. Because the current
> shrink_active_list() policy is to deactivate active pages regardless of their
> referenced bits. The deactivated pages become susceptible to the streaming IO
> attack: the inactive list could be scanned fast (500MB / 50MBps = 10s) so that
> the deactivated pages don't have enough time to get re-referenced. Because a
> user tend to switch between windows in intervals from seconds to minutes.
>
> This patch holds mapped executable pages in the active list as long as they
> are referenced during each full scan of the active list.  Because the active
> list is normally scanned much slower, they get longer grace time (eg. 100s)
> for further references, which better matches the pace of user operations.
>
> Therefore this patch greatly prolongs the in-cache time of executable code,
> when there are moderate memory pressures.
>
>        before patch: guaranteed to be cached if reference intervals < I
>        after  patch: guaranteed to be cached if reference intervals < I+A
>                      (except when randomly reclaimed by the lumpy reclaim)
> where
>        A = time to fully scan the   active file LRU
>        I = time to fully scan the inactive file LRU
>
> Note that normally A >> I.
>
> side effects
> ------------
>
> This patch is safe in general, it restores the pre-2.6.28 mmap() behavior
> but in a much smaller and well targeted scope.
>
> One may worry about some one to abuse the PROT_EXEC heuristic.  But as
> Andrew Morton stated, there are other tricks to getting that sort of boost.
>
> Another concern is the PROT_EXEC mapped pages growing large in rare cases,
> and therefore hurting reclaim efficiency. But a sane application targeted for
> large audience will never use PROT_EXEC for data mappings. If some home made
> application tries to abuse that bit, it shall be aware of the consequences.
> If it is abused to scale of 2/3 total memory, it gains nothing but overheads.
>
> CC: Elladan <elladan@...imo.com>
> CC: Nick Piggin <npiggin@...e.de>
> CC: Johannes Weiner <hannes@...xchg.org>
> CC: Christoph Lameter <cl@...ux-foundation.org>
> CC: KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>
> Acked-by: Peter Zijlstra <peterz@...radead.org>
> Acked-by: Rik van Riel <riel@...hat.com>
> Signed-off-by: Wu Fengguang <fengguang.wu@...el.com>

Reviewed-by: Minchan Kim <minchan.kim@...il.com>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists