linux-kernel - Re: [PATCH 18/19] mm: sched: numa: Implement constant, per task Working Set Sampling (WSS) rate

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <50996B49.7070407@redhat.com>
Date:	Tue, 06 Nov 2012 14:55:53 -0500
From:	Rik van Riel <riel@...hat.com>
To:	Mel Gorman <mgorman@...e.de>
CC:	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Andrea Arcangeli <aarcange@...hat.com>,
	Ingo Molnar <mingo@...nel.org>,
	Johannes Weiner <hannes@...xchg.org>,
	Hugh Dickins <hughd@...gle.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Linux-MM <linux-mm@...ck.org>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 18/19] mm: sched: numa: Implement constant, per task Working
 Set Sampling (WSS) rate

On 11/06/2012 04:14 AM, Mel Gorman wrote:
> From: Peter Zijlstra <a.p.zijlstra@...llo.nl>
>
> Note: The scan period is much larger than it was in the original patch.
> 	The reason was because the system CPU usage went through the roof
> 	with a sample period of 100ms but it was unsuitable to have a
> 	situation where a large process could stall for excessively long
> 	updating pte_numa. This may need to be tuned again if a placement
> 	policy converges too slowly.
>
> Previously, to probe the working set of a task, we'd use
> a very simple and crude method: mark all of its address
> space PROT_NONE.
>
> That method has various (obvious) disadvantages:
>
>   - it samples the working set at dissimilar rates,
>     giving some tasks a sampling quality advantage
>     over others.
>
>   - creates performance problems for tasks with very
>     large working sets
>
>   - over-samples processes with large address spaces but
>     which only very rarely execute
>
> Improve that method by keeping a rotating offset into the
> address space that marks the current position of the scan,
> and advance it by a constant rate (in a CPU cycles execution
> proportional manner). If the offset reaches the last mapped
> address of the mm then it then it starts over at the first
> address.
>
> The per-task nature of the working set sampling functionality in this tree
> allows such constant rate, per task, execution-weight proportional sampling
> of the working set, with an adaptive sampling interval/frequency that
> goes from once per 2 seconds up to just once per 32 seconds.  The current
> sampling volume is 256 MB per interval.
>
> As tasks mature and converge their working set, so does the
> sampling rate slow down to just a trickle, 256 MB per 8
> seconds of CPU time executed.
>
> This, beyond being adaptive, also rate-limits rarely
> executing systems and does not over-sample on overloaded
> systems.
>
> [ In AutoNUMA speak, this patch deals with the effective sampling
>    rate of the 'hinting page fault'. AutoNUMA's scanning is
>    currently rate-limited, but it is also fundamentally
>    single-threaded, executing in the knuma_scand kernel thread,
>    so the limit in AutoNUMA is global and does not scale up with
>    the number of CPUs, nor does it scan tasks in an execution
>    proportional manner.
>
>    So the idea of rate-limiting the scanning was first implemented
>    in the AutoNUMA tree via a global rate limit. This patch goes
>    beyond that by implementing an execution rate proportional
>    working set sampling rate that is not implemented via a single
>    global scanning daemon. ]
>
> [ Dan Carpenter pointed out a possible NULL pointer dereference in the
>    first version of this patch. ]
>
> Based-on-idea-by: Andrea Arcangeli <aarcange@...hat.com>
> Bug-Found-By: Dan Carpenter <dan.carpenter@...cle.com>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@...llo.nl>
> Cc: Linus Torvalds <torvalds@...ux-foundation.org>
> Cc: Andrew Morton <akpm@...ux-foundation.org>
> Cc: Peter Zijlstra <a.p.zijlstra@...llo.nl>
> Cc: Andrea Arcangeli <aarcange@...hat.com>
> Cc: Rik van Riel <riel@...hat.com>
> [ Wrote changelog and fixed bug. ]
> Signed-off-by: Ingo Molnar <mingo@...nel.org>
> Signed-off-by: Mel Gorman <mgorman@...e.de>

Reviewed-by: Rik van Riel <riel@...hat.com>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/