lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 12 Mar 2015 15:30:38 -0400
From:	Michal Hocko <mhocko@...e.cz>
To:	Eric B Munson <emunson@...mai.com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Vlastimil Babka <vbabka@...e.cz>,
	Thomas Gleixner <tglx@...utronix.de>,
	Christoph Lameter <cl@...ux.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Mel Gorman <mgorman@...e.de>,
	David Rientjes <rientjes@...gle.com>,
	Rik van Riel <riel@...hat.com>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH V4] Allow compaction of unevictable pages

On Thu 12-03-15 11:22:56, Eric B Munson wrote:
> Currently, pages which are marked as unevictable are protected from
> compaction, but not from other types of migration.  The mlock
> desctription does not promise that all page faults will be avoided, only
> major ones so this protection is not necessary.  This extra protection
> can cause problems for applications that are using mlock to avoid
> swapping pages out, but require order > 0 allocations to continue to
> succeed in a fragmented environment.  This patch adds a sysctl entry
> that will be used to allow root to enable compaction of unevictable
> pages.

It would be appropriate to add a justification for the sysctl, because
it is not obvious from the above description. mlock preventing from the
swapout is not sufficient to justify it. It is the real time extension
mentioned by Peter in the previous version which makes it worth a new
user visible knob.

I would also argue that the knob should be enabled by default because
the real time extension requires an additional changes anyway (rt-kernel
at least) while general usage doesn't need such a strong requirement.

You also should provide a knob description to
Documentation/sysctl/vm.txt

> To illustrate this problem I wrote a quick test program that mmaps a
> large number of 1MB files filled with random data.  These maps are
> created locked and read only.  Then every other mmap is unmapped and I
> attempt to allocate huge pages to the static huge page pool.  When the
> compact_unevictable sysctl is 0, I cannot allocate hugepages after
> fragmenting memory.  When the value is set to 1, allocations succeed.
> 
> Signed-off-by: Eric B Munson <emunson@...mai.com>
> Cc: Vlastimil Babka <vbabka@...e.cz>
> Cc: Thomas Gleixner <tglx@...utronix.de>
> Cc: Christoph Lameter <cl@...ux.com>
> Cc: Peter Zijlstra <peterz@...radead.org>
> Cc: Mel Gorman <mgorman@...e.de>
> Cc: David Rientjes <rientjes@...gle.com>
> Cc: Rik van Riel <riel@...hat.com>
> Cc: linux-mm@...ck.org
> Cc: linux-kernel@...r.kernel.org

After the above things are fixed
Acked-by: Michal Hocko <mhocko@...e.cz>

One minor suggestion below

> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 88ea2d6..cc1a678 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -1313,6 +1313,13 @@ static struct ctl_table vm_table[] = {
>  		.extra1		= &min_extfrag_threshold,
>  		.extra2		= &max_extfrag_threshold,
>  	},
> +	{
> +		.procname	= "compact_unevictable",
> +		.data		= &sysctl_compact_unevictable,
> +		.maxlen		= sizeof(int),
> +		.mode		= 0644,
> +		.proc_handler	= proc_dointvec,

You can use .extra1 = &zero and .extra2 = &one to reduce the value
space.

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ